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PREFACE 


Historically, array transform processors have been 
largely integer-arithmetic devices, since the 
slower processing rate of floating-point arithmetic 
was undesirable when working with large arrays of 
data. However, integer methods have problems which 
make programming awkward due to the limited dynamic 
range of integer arithmetic. Array scaling and 
block floating-point techniques either allowed 
human and other errors to creep into the.results or 
were costly and time consuming. Further, as 
processing became more sophisticated, even 16-bit 
integer data words were insufficiently precise for 
preserving the accuracy of simple 8=bit 
analog-to-digital converted input data. This is 
because the many multiplications and additions in 
typical cascaded array processing can cause the 
propagation of truncation errors. 


NOTE 


A l6-bit integer multiplied by 
a 16-bit integer results in a 
32-bit product. If the result 
is truncated to the 16 most 
significant bits, then half the 
time the  resultant’s least 
significant bit (LSB) is wrong 
since it should have been 
rounded up. Now the product of 
two of these potentially wrong 
LSB numbers results in the next 
LSB being wrong part of the 
time; thus cascaded operations 
propagate the errors leftward 
toward the most significant 
bits. 
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With the advent of faster digital logic, many users 
realized that floating-point processing makes 
programming easier, virtually eliminates dynamic. 
range problems, greatly alleviates the precision 
problem, and is potentially as fast as the last 
generation of integer processors.e Floating Point 
Systems, Inc.e, recognized this trend in 1970 and 
was formed to specialize in floating-point 
processors. 


The rush to floating-point processing was not a 
smooth one. Many floating-point formats sprang up 
and Floating Point Systems became expert in format 
converting on-the-fly so processing time would not 
be lost during a format conversion. Why convert 
formats? Simple. Not all formats are 
mathematically clean. For example, it is unwise to 
use a hexadecimal~exponent format for serious 
number crunching because a hexadecimal 
normalization can cause as many as three leading 
zeros between the binary point of the mantissa and 
the first significant bit. This means that as many 
as three least-significant bits may be lost, due to 
right-shifting the mantissa past the available word 
length (truncation) when an extreme hexadecimal 
normalization occurs (about 25 percent of the 
time), and, of course, 2, 1, or no bits may be lost 
(with equal probability) for other possible 
hexadecimal cases. Cascaded calculations can 
quickly cause the low-resolution three-leading-zero 
data words to contaminate a data base. 


The FPS solution is to use a true 10-bit binary 
exponent, which has more dynamic range than the 
Standard /7=-bit hexadecimal or 8=bit binary 
exponent. FPS then uses a 28-bit mantissa, plus 
three guard bits in the adder and a double mantissa 
at the multiplier output, which provides enough 
bits to not only allow for hexadecimal in/out 
formats, but also to carry enough information to 
permit post-normalization and convergent-rounding 
after each arithmetic operation. Thus, FPS can 
receive any reasonable floating-point format that 
is desired as the input format, convert Lt 
on-the-fly to the FPS format, process it in FPS 
format with minimal truncation error propagation, 
and then convert it on-the-fly to the desired 
output format. This procedure allows transparent 
no penalty operation on the data, thus preserving 
the integrity of the input data. 
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In addition to the well chosen floating-point. 
format, the AP has a general-purpose, multi-bus 
oriented architecture for the arithmetic units. 
This allows great flexibility in that operands and 
resultants can be moved simultaneously from almost 
any register in the AP to any other. This rather 
generalized structure of the AP allows it to 
execute specialized algorithms, such as the FFT, in 
times comparable to those achieved by hardwired 
special-purpose processors and also makes the AP 
well suited to less highly organized computations. 


In the matter of software, note that this machine 
is a synchronous monolithic multiprocessor, as 
opposed to an asynchronous multiprocessor. The 
practical significance of this is that programming 
by the user and/or FPS (Standard Algorithms, System 
and Test Software) is tremendously simplified due 
to the predictability of data flow and timing 
considerations. There is no need for internal 
hand=shaking between arithmetic units, memories, 
and microprocessor; data and results are available 
at precisely determined times. The synchronous 
approach not only allows a non-stochastic simulator 
to be written for easy program debugging, but in 
addition, programs may be single-stepped in the 
real processor, with execution identical to 
free-running programs. A further bonus of the 
synchronous design is the easy producibility, . 
maintainability, interchangeability and reliability 
(there is no need to explore an infinite number of 
possible timing conditions as one clock phases by 
another, as happens in an asynchronous machine). 
Convenient and rapid data-dependent branching, 
simple overlapping of data input, arithmetic 
processing, and data output are further examples of 
the care taken to assure a fast, accurate, 
convenient, and reliable array processor. 
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CHAPTER 1 


GENERAL INFORMATION 


1.1 INTRODUCTION 


The AP is a high-speed (167ns cycle time) peripheral floating-point 
arithmetic array processor (AP), which is intended to work in parallel 
with a host computer. 


The AP’s internal organization is particularly well suited to 
performing the large numbers of reiterative multiplications and 
additions required in digital signal processing, matrix arithmetic, 
statistical analysis, and numerical simulation. 


The highly-parallel structure of the AP allows the overhead of array 
indexing, loop counting, and data fetching from memory to be performed 
simultaneously with arithmetic operations on the data. This allows 
much faster execution than on a typical general-purpose computer where 
each of the above operations must occur sequentially. 


The AP achieves its high speed through the use of fast commercial 
integrated circuit elements and an architecture that permits each 
logical unit of the machine to operate independently and at maximum 
speed. . 
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Specifically: 


e Programs, constants, and data each reside in separate, 
independent memories to eliminate memory accessing 
conflicts. 


e Independent floating-point multiply and adder units allow 
both arithmetic operations to be initiated every 167ns. 


e Two large (32 locations each) blocks of floating-point 
accumulators are available for temporary storage 
of intermediate results from the multiplier, adder, 
or memory. 


e Address indexing and counting functions are performed by an 
independent integer arithmetic unit that includes 
l6-integer accumulators. 


In a typical application, such as a fast fourier transform (FFT), the 
above features allow nearly the entire computation to be overlapped 
with data memory access time. 


Effective processing precision is enhanced by 38-bit internal data 


words, an internal floating-point format with optimum numerical 
properties, and a convergent rounding algorithm. 
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1.2 SYSTEM OVERVIEW 
A general’ block diagram of AP arithmetic paths appears in Figure 1-1. 


Connection is made to the host in a manner that permits data transfers 
to occur under control of either the host computer or the AP. For most 
host computers, this means that the AP is interfaced to both the 
programmed I/O and DMA channels. 


The system elements are interconnected with multiple parallel paths so 
that transfers can occur in parallel. All internal floating-point data 
paths are 38 bits wide (10-bit biased binary exponent and 28-bit 2’s 
complement mantissa). 


Main data memory (MD) is organized in 8K-word modules of 38-bit words 
expandable up to 64K words in the main chassis.e The effective memory 
cycle time (interleaved) is 333ns. 


Table memory (TM) is used for storage of constants (FFT constants) and 
is tied to a separate data path so as not to interfere with data 
memory.e It is bi-polar 167ns read-only memory and is organized in 
512-word, 38-bit increments. 


Data pad X (DPX) and data pad Y (DPY) are two blocks of 32 floating 
accumulators. Each is a two-part register block, wherein one register 
may be read and another written from each block in one instruction 
cycle. 


The floating adder (FA) consists of two input registers (Al and A2) and 
a two-stage pipeline which performs the operations and convergently 
rounds the normalized result. 


The floating multiplier (FM) consists of input registers (Ml and M2) 
and a three-stage pipeline which performs the multiply operation. 
Products are normalized and convergently rounded 38=bit numbers. 


The s-pad consists of 16 integer registers and an integer arithmetic 
unit which is used to form operand addresses and to perform integer 
arithmetic. 


Chapter 2 contains a more detailed description of each of the 
functional elements. Chapter 3 describes programming considerations. 


Chapter 4 describes in detail the host computer interface, which 


Floating Point Systems, Inc., open A number of off-the-shelf 
interfaces are available. 
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Figure 1-1 General AP Block Diagram 
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1.3 EXAMPLE AP APPLICATION 
A simple FFT processing sequence goes as follows: 


Initial conditions are that the FFT program is resident in program 
source memory internal to the AP, the array to be transformed is 
resident in host memory, and the host CPU has initiated the AP 
processor with an I/O instruction. ; 


l. The AP requests host DMA cycles to transfer the array 
from host memory to internal data memory. Data is 
converted from host floating-point format to internal 
AP floating-point format on-the-fly. 


2. The FFT algorithm is performed with data remaining in 
internal AP format. This yields the benefit of 38-bit 
precision and convergent rounding during the critical 
phases of processing. 


3. The frequency domain array is transferred back to host 
memory by requesting host DMA cycles. Data is converted 
from internal format to host format on-the-fly. 


4. The AP proceeds to another process or stops executing, 
depending on previously established conditions. 
An interrupt to the host can be issued. 


The AP is most efficiently used when a sequence of operations is 
performed on one or more sets of data which reside in internal data 
memory» This reduces data transfer overhead and retains maximum 
numerical precision. For example, a reasonable sequence would be to 
transfer a trace and a filter, FFT both, array multiply, inverse FFT, 
and transfer the result back to host memory. 


The AP data memory has DMA capability. That is to say, MD cycles can 
be stolen from the AP microprocessor by the interface. This capability 
allows host computer DMA to AP DMA data transfers to occur, thereby 
minimizing both host CPU and AP overhead. 


The AP is designed with enough flexibility built in so that its power 


can be harnessed in a variety of ways. Subsequent sections describe 
its use in detail. 
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1.4 PHYSICAL DESCRIPTION 


The following sections describe the AP hardware. 


1.4.1 GENERAL 


The AP is available in rack configuration. Mounting is as a standard 
19-inch EIA rack-mounted unit requiring 24-1/2 inches of vertical 
space- The unit is equipped with rack slides permitting easy access to 
the etched and/or wire-wrapped circuitry with the chassis mounted on 
the forward portion of the unit. The power panel is mounted at the 
rear.e One and three-quarter inches of space should be available above 
and below the 24-1/2 inches of the processor. This is for proper 
intake and exhaust of air through the processor. The control panel 
(refer to section 1.4.4) and/or blank panels may be used for proper 
spacing if the customer’s equipment mounted above and below the 
processor does not have the proper free-air space built into it. 
Intake air should be between 10 and 40 degress centigrade. 


1.4.2 FORWARD UNIT 


The forward unit contains all AP circuitry except the power supply. 
There is provision for up to 31 15-by-l0-inch etched-circuit boards 
(ECB). The ECBs plug into a mother board. The ECBs are arranged in a 
vertical plane (chimney style) with push/pull fans to assure adequate 
upwards air circulation even in the event of a fan failure. The 1/0 
cable exits at the bottom rear (the exact configuration is computer 
dependent). This unit is called the processor. 


1.4.3 REAR UNIT 


The power supply consists of three assemblies. The first is the main 
+5 volt supply and is capable of 100 amperes output. The other smaller 
supplies are -5 and +12 volts. The power supplies have forced 
convection cooling. All supplies are rear-mounted, along with the line 
box (containing line filters and contactor), on the power panel. 
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1.4.4 POWER, CONTROLS, AND INDICATORS 


The AP is expected to be normally powered up and down with the rest of 
the system. The AP switch and indicators are on a control panel. 

There is a single power cord (US standard three-wire with ground) which 
must be connected to 105 to 125 volts, 50 to 60 hertz. The service 
should be rated for 20 amperes (10 amperes in the case of the higher 
ranges) in order to provide a low-impedance source (power required is 
approximately 1200 volt-amps)-. The control panel may be mounted above 
or below either the processor or the power panel. Availability of line 
power is indicated by a neon LINE VOLTAGE indicator. If the ON/OFF 
switch is on, then power supplies should come on. There are two 
operation indicators: one shows array processor action and the other 
shows DMA transfers. The three individual power supplies have separate 
indicators (electroluminescent diodes). There are no external 
adjustments. The internal adjustments are the three power supply 
setting potentiometers on the power panel. 


1.4.5 SERIAL NUMBERS. 


The processor has a serial number tag on its starboard side near the 
top and forward ending in A. The power panel tag, ending in B, is 
located inside and near the top- The control panel has its tag ending 
in C, also inside. 
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REQUIREMENTS 


1) ENVIRONMENT: @ - 49°C @@ - 90% RELATIVE HUMIDITY. | 
(DERATE 1°S PER 250@ FT. (762 M) ABOVE SEA LEVEL, 5 C FOR 50 HERTZ OPERATION. ) 


POWER CONSUMPTION = 109@ W; SERVICE: 

A. 195 - 125 VOLTS, 5@ - 62 HERTZ @ 20 AMPS. (VOLTAGE OPTION "A" HAS A WHITE WIRE IN THE FAN POWER CABLE. ) 
B. 188 - 223 VOLTS, 5@ - 60 HERTZ @ 10 AMPS. (VOLTAGE OPTION "B" HAS A BLUE WIRE IN THE FAN POWER CABLE.) 
D. 


tr 
~~ 


. 210 - 262 VOLTS, 50 - 60 HERTZ @ 10 AMPS. (VOLTAGE OPTION “C" HAS A RED WERE IN THE FAN POWER CABLE. } 
OW IMPEDANCE SERVICE AOVISED. 


SPACE: 
*HEIGHT: WITH CONTROL PANEL AT THE FRONT; 245;" (62.23 Ci). 
AiTH CONTROL PANEL AT THE REAR; 223%," (57.79 CM). 
ALDTH: 29" (48.26 CM). . 
DEPTH: 2 - 25" (50.30 - 63.52 CM). 


ba 
— 


CAUTION: ALLCW AT LEAST 1.75" OF FREE AIR SPACE ABOVE THE AP IF USED AS SHOWN. IF THE CONTROL PANEL [5S 4AQVED , 
ALLOW 1.75" OF FREE AIR SPACE 3ELOW THE AP. 


NOTe: THE 2OWER PANEL TO AP POWER CABLE IS LOCATED ON THE LOWER RIGHT SIDE (NOT SHOWN). 
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Figure 1-2 AP Physical Configuration 
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1.5 SOFTWARE 


Four software packages can be supplied with the AP which assist the 
user toward the solution of the particular processing task. 


1.5.1 APEX (AP EXECUTIVE) 


APEX is a mechanism for communicating with the AP via a series of 
FORTRAN or machine language subroutine calls. The executive driver 
routine interprets the particular user call and directs the AP to 
perform the specified action. For example, in FORTRAN, to load an 
array A containing N real data points into the AP and perform a real 
fast fourier transform upon that data: | 


CALL APPUT (A,IA,N,2) 


CALL RFFT (IA,N,1) 


Both the standard applications subroutines described below and 
user-developed AP programs may be called from the host computer using 
APEX. 


1.5.2 APMATH (AP MATH LIBRARY) 
There are 239 subroutines written in AP assembly language. They are 


callable from the host computer FORTRAN or machine language using APEX. 
They are listed in Table 1-3. 
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1-5-3 PROGRAM DEVELOPMENT PACKAGE 


Six FORTRAN IV programs, which are compiled on the host computer during 
installation, aid user program development. 


These are: 


APAL AP assembly language. Cross-assembler 
which provides a two-pass assembly of 
symbolic coding into an object module. 
APAL generates detailed error diagnostics. 


APLOAD © AP loader. Links and relocates 
separate APAL and AP-FORTRAN object 
modules together. It produces a load 
module and a host FORTRAN subroutine 
which transfers the load module to the 
AP. 


APDBUG AP debugger. Interactive debugging 
. program. The user may selectively set 
breakpoints, examine and change memory, 
and register contents and run program 
segments. 


APSIM AP simulator. Called by APDBUG, APSIM 
provides a programmed simulation of the 
various hardware elements of the AP. 
All timing characteristics of the AP 
are emulated and the floating-point 
arithmetic is simulated (including 
rounding) to the least significant bit. 
APSIM is a convenient tool in bringing 
up new AP programs off-line without 
interferring with production runs. 


VFC Vector Function Chainer. A translator 
to convert VFC syntax to AP assembly 
language (APAL). It consolidates 
multiple CALLS to the AP from the host 
computer into one CALL whenever possible. 
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AP=-FORTRAN Array processor FORTRAN. A compiler 
which allows FORTRAN subprograms to 
execute on the AP. The compiler produces 
object modules which are used as input to 
the AP loader (APLOAD). 


1.54 APTEST (AP TEST PROGRAMS ) 


APTEST is a collection of interactive diagnostic tests and verify 
programs which aid in isolation of hardware faults. 


These are: 


APTEST . AP tester. Exercises the panel, DMA 
interface, and various internal registers 
and memories. Tests main data memory with 
simple patterns and then with random 
numbers. Board-level diagnostic indicators 
are provided. 


APPATH AP path tester. Tests the various 
internal data paths and gives board 
level diagnostics. 


APARTH AP arithmetic test. Tests the 
floating-point adder, multiplier, and 
s-pad arithmetic unit with pseudorandom 
number and operation sequences. 


FIFFT _Forward/inverse FFT test. Verifies 
the correct operation of the AP 
as a complete unit by doing 
forward/inverse FFT transforms on both 
spikes and random number sequences. 
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Table 1-1 Floating-Point Arithmetic Times 


OPERATION TRAVEL TIME PIPELINE INTERVAL 


Add/Subtract 9.167 us 
ate 0.167 us 
Multiply-Add 9.167 us 
Complex Add/Subtract 9.333 us 
Complex Multiply | ss | 2.667 us 
Comp tex Multiply-Add 9.667 us 


0983 


Travel time is the total time required to get from the data source to 
the destination including the full transport through the arithmetic 
units. Pipeline interval is the time between successively available 
resultants. The former is important when the successive arguments of a 
computation depend on previous calculations. The latter is indicative 
of the maximum throughput rate available for successively independent 
calculators. 
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Table 1-2 Basic Scalar Functions 


TYPICAL EXECUTION PROGRAM SIZE 
TIME/LOOP (us) (AP PS WORDS) 


Arctangent 8.7 74 74 


Arctangent of (Y/X} 13.3 13.8 74 


9984 


GPERAT ION 


These functions take arguments from data pad and return full-word 
accuracy results to data pad. Full-precision polynomial coefficients 
for these functions are contained on the standard 512 words of table 
memory « 
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Name 


VCLR 
VMOV 
VSWAP 
VFILL 
VRAMP 
VNEG 
VADD 
VSUB 
VMUL 
VDIV 
VSADD 
VSMUL 
VISADD 
VISMUL 
VSQ 
VSSQ 
VABS 
VSQRT 
VLOG 
VLN 


Table 1-3 


Operation 


Summary of AP FORTRAN Callable Routines 


Typical 
Execution 
Time/Loop 

(us ) 


167 | 333 


DATA TRANSFER AND CONTROL OPERATIONS (APEX) 


PUT DATA INTO THE AP 
GET DATA FROM THE AP 
INITIALIZE THE AP 


WAIT FOR AP DATA TRANSFER 


WAIT FOR AP PROGRAM EXECUTION 


WAIT FOR AP 


READ AN AP S-PAD REGISTER 


CHECK AP PROGRAM ERROR CONDITION 


GET AP 


VECTOR 
VECTOR 
VECTOR 
VECTOR 
VECTOR 
VECTOR 
VECTOR 
VECTOR 
VECTOR 
VECTOR 
VECTOR 
VECTOR 
VECTOR 
VECTOR 
VECTOR 
VECTOR 
VECTOR 
VECTOR 
VECTOR 
VECTOR 
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HARDWARE STATUS 


CLEAR 

MOVE 

SWAP 

FILL 

RAMP 

NEGATE 

ADD 

SUBTRACT 

MULTIPLY 

DIVIDE 

SCALAR ADD 

SCALAR MULTIPLY 
TABLE SCALAR ADD 
TABLE SCALAR MULTIPLY 
SQUARE 

SIGNED SQUARE 
ABSOLUTE VALUE 
SQUARE ROOT 
LOGARITHM (BASE 10) 
NATURAL LOGARITHM 


Program 
Size 
(AP 

PS words) 


167 | 333 


tot #.é# 
Hit #o# 
Hit #4 
#o#t #o# 
tot #a# 
#.# #.# 
tot fot 
fit #.# 
tot #.# 
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Table 1-3 Summary of AP FORTRAN Callable Routines (cont.) 


Typical Program 
, Execution Size 
Name Operation Time/Loop (AP 
(us ) PS words) 
1674-333 167 | 333 
VALOG VECTOR ANTILOGARITHM (BASE 10) 2-3 Ze3 58 58 
VEXP VECTOR EXPONENTIAL 223 283 55 55 
VSIN VECTOR SINE 1.3 1.3 34 34 
Vvcos VECTOR COSINE 1.3 Le3 34 34 
VATAN VECTOR ARCTANGENT 9.7 9.8 87 87 
VATN2 VECTOR ARCTANGENT OF Y/X 14.2 14.2 88 88 
VRAND VECTOR RANDOM NUMBERS 1.2 1.2 16 16 
VMSA VECTOR MULTIPLY AND SCALAR ADD 0.8 1.3 23 14 
VSMA VECTOR SCALAR MULTIPLY AND ADD 0.8 1.3 21 14 
VSMSB VECTOR SCALAR MULTIPLY AND SUBTRACT 0.8 1.3 21 14 
VMA VECTOR MULTIPLY AND ADD 1.2 1.8 23 15 
VMSB VECTOR MULTIPLY AND SUBTRACT . lez 1.8 23 15 - 
VAM VECTOR ADD AND MULTIPLY 1.2 1.8 23 14 
VSBM VECTOR SUBTRACT AND MULTIPLY | 1.2 1.8 23 14 
VSMSA VECTOR SCALAR MULTIPLY AND SCALAR ADD 0-5 0.8 23 15 
VMMA VECTOR MULTIPLY, MULTIPLY, AND ADD 1.5 265 ay 19 
VMMSB VECTOR MULTIPLY MULTIPLY AND SUBTRACT 1.5 203 27 19 
VAAM VECTOR ADD, ADD, AND MULTIPLY Leo 203 13 20 
VSBSBM VECTOR SUBTRACT SUBTRACT AND MULTIPLY 1.5 Zea 13 20 
VAND VECTOR LOGICAL AND a 0.8 1.3 20 8 
VEQV VECTOR LOGICAL EQUIVALENCE 0.8 1.3 20 8 
VOR VECTOR LOGICAL OR 0.8 1.3 20 8 
VFRAC VECTOR TRUNCATE TO FRACTION 0.7 0.8 13 13 
VINT VECTOR TRUNCATE TO INTEGER 0.5 Q.8 9 9 
VINDEX VECTOR INDEX 0.8 1.3 28 26 
VECTOR-TO=SCALAR OPERATIONS 
SVE SUM OF VECTOR ELEMENTS 0.3 0.3 7 7 
SVEMG SUM OF VECTOR ELEMENT MAGNITUDES 0.3 0.3 10 10 
SVESQ SUM OF VECTOR ELEMENT SQUARES 0.3 0.3 10 10 
SVS SUM OF VECTOR SIGNED SQUARES 0.3 0.3 ll ll 
DOTPR DOT PRODUCT 0.5 0.8 Zs 9 
MAXV MAXIMUM ELEMENT IN VECTOR 0.3 0.3 19 19 
MINV MINIMUM ELEMENT IN VECTOR 0.3 0.3 19 19 
MAXMGV MAXIMUM MAGNITUDE ELEMENT IN VECTOR 0.3 0.3 19 19 
MINMGV 0.3 0.3 19 19 


MINIMUM MAGNITUDE ELEMENT IN VECTOR 
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Table 1-3 


Summary of AP FORTRAN Callable Routines (cont.) 


Typical Program 
Execution Size 
Name Operation Time /Loop (AP 
(us ) PS words) 
167 | 333 167 | 333 
MEANV MEAN VALUE OF VECTOR ELEMENTS 0.3 0.3 49 49 
MEAMGV MEAN OF VECTOR ELEMENT MAGNITUDES 0.3 0.3 52 52 
MEASQV MEAN OF VECTOR ELEMENT SQUARES 0.3 0.3 52 De 
RMSQV ROOT-MEAN-SQUARE OF VECTOR ELEMENTS 0.3 0.3 81 81 
VECTOR COMPARISON OPERATIONS 
VMAX VECTOR MAXIMUM 0.8 1.3 22 13 
VMIN VECTOR MINIMUM 0.8 1.3 22 13 
VMAXMG VECTOR MAXIMUM MAGNITUDE 0.8 1.3 14 14 
VMINMG VECTOR MINIMUM MAGNITUDE 0.8 1.3 14 14 
VCLIP VECTOR CLIP , 0.5 0.8 16 16 
VICLIP VECTOR INVERTED CLIP 0.7 0.8 19 19 
VLIM VECTOR LIMIT 0.5 0.8 14 14 
LVGT LOGICAL VECTOR GREATER THAN 0.8 1.3 23 L3 
LVGE LOGICAL VECTOR GREATER THAN OR EQUAL 0.8 1.3 23 Ls 
LVEQ LOGICAL VECTOR EQUAL 0.8 1.3 23 13 
LVNE LOGICAL VECTOR NOT EQUAL 0.8 Ls3 23 13 
LVNOT LOGICAL VECTOR NOT 0-5 0.8 24° “32 
VLMERG VECTOR LOGICAL MERGE 0.8 1.5 23 16 
COMPLEX VECTOR ARITHMETIC 
CVMOV COMPLEX VECTOR MOVE 0.8 1.3 9 9 
CVFILL COMPLEX VECTOR FILL 0.5 0.7 8 8 
CVCOMB COMPLEX VECTOR COMBINE il 1.7 10 10 
CVREAL FORM COMPLEX VECTOR OF REALS 0.8 rw 9 9 
VREAL EXTRACT REALS OF COMPLEX VECTOR 0.5 0.8 17 7 
VIMAG EXTRACT IMAGINARIES OF COMPLEX VECTOR 0.5 0.8 18 8 
CVNEG COMPLEX VECTOR NEGATE 0.8 1.3 ib 11 
CVCONJ COMPLEX VECTOR CONJUGATE 0.7 1.3 10 12 
Cc VADD COMPLEX VECTOR ADD 1.0 2-0 13 12 
CVSUB COMPLEX VECTOR SUBTRACT 1.0 2-0 13 t2 
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Table 1-3 Summary of AP FORTRAN Callable Routines (cont.) 
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SIGN EXTEND AND FLOAT 


Typical Program 
Execution Size 
Name Operation Time/Loop (AP 
(us ) PS words) 
L671 333 167 | 333 
CVMUL COMPLEX VECTOR MULTIPLY 1.0 2-0 25 26 
CVSMUL COMPLEX VECTOR SCALAR MULTIPLY 0.8 1.3 i2 12 
CVRCIP COMPLEX VECTOR RECIPROCAL Bre 5.2 50 50 
CRVADD COMPLEX AND REAL VECTOR ADD led 1.8 14 14 
CRVSUB COMPLEX AND REAL VECTOR SUBTRACT 1.3 1.8 14 14 
CRVMUL COMPLEX AND REAL VECTOR MULTIPLY 1.3 1.8 14 14 
CRVDIV COMPLEX AND REAL VECTOR DIVIDE 3.3 3.3 92 92 
CVMA COMPLEX VECTOR MULTIPLY AND ADD L.3 2-7 29 30 
CVMAGS COMPLEX VECTOR MAGNITUDE SQUARED 0.7 1.2 13 18 
SC JMA SELF-CONJUGATE MULTIPLY AND ADD 0.8 135 14 15 
POLAR RECTANGULAR TO POLAR CONVERSION 19.5 19.5 120 120 
RECT POLAR TO RECTANGULAR CONVERSION 2-3 2.3 49 49 
CVEXP COMPLEX VECTOR EXPONENTIAL 220 2-0 43 43 
CVMEXP VECTOR MULTIPLY COMPLEX EXPONENTIAL 2.3 2-3 48 48 
CDOTPR COMPLEX DOT PRODUCT 0.7 1.3 15 16 
DATA FORMATTING OPERATIONS 
VFLT VECTOR INTEGER FLOAT 0.5 0.8 13 ll 
VFIX VECTOR INTEGER FIX O.7 0.8 18 7 
VSMAFX VECTOR SCALAR MULTIPLY, ADD, AND FIX 0.7 0.8 14 13 
VSCALE VECTOR SCALE (POWER 2) AND FIX 0.7 0.8 12 12 
VSCSCL VECTOR SCAN, SCALE (POWER 2) AND FIX 1.5 1.7 19 19 
VSHFX VECTOR SHIFT AND. FIX 0.7 0.8 9 9 
VUP8 VECTOR 8-BIT BYTE UNPACK 0.5 0.5 71 71 
VUPS8 VECTOR 8-BIT SIGNED BYTE UNPACK 0.9 0.9 107 107 
VPK8 VECTOR 8=BIT BYTE PACK 0.9 0.9 65 65 
VUP16 VECTOR 16=<BIT BYTE UNPACK 0.8 0.8 61 61 
VUPS16 VECTOR 16=BIT SIGNED BYTE UNPACK 1.3 | s 58 58 
VPK16 VECTOR 16-BIT BYTE PACK 0.8 0.8 46 46 
VFLT32 VECTOR 32-BIT INTEGER. FLOAT bey Led 65 65 
VFIX32 VECTOR 32=—BIT INTEGER FIX Laz 1.2 33 33 
VSEFLT VECTOR 0.8 0.8 15 iS 


Table 1-3 


Name 


MTRANS_ 


MMUL 
MMUL 32 
MATINV 
SOLVEQ 
MVML3 
MVML4 
CTRN3 
FMMM 
FMMM 32 


RFFTB 

CFFTSC 
RFFTSC 
CFFT2D 
RFFT2D 


Operation 


MATRIX OPERATIONS 


MATRIX TRANSPOSE 

MATRIX MULTIPLY 

MATRIX MULTIPLY (DIMENSION <#=32) 
MATRIX INVERSE 

LINEAR EQUATION SOLVER 

MATRIX VECTOR MULTIPLY (3X3) 

MATRIX VECTOR MULTIPLY (4X4) 
3-DIMENSION COORDINATE TRANSFORMATION 
FAST MEMORY MATRIX MULTIPLY 


FAST MEMORY MATRIX MULTIPLY (<=32) 


COMPLEX TO COMPLEX FFT (IN PLACE) 
COMPLEX TO COMPLEX FFT (NOT IN PLACE) 
REAL TO COMPLEX FFT (IN PLACE) 

REAL TO COMPLEX FFT (NOT IN PLACE) 
COMPLEX FFT SCALE 

REAL FFT SCALE AND FORMAT 

COMPLEX TO COMPLEX 2=—DIMENSIONAL FFT 
REAL TO COMPLEX 2-DIMENSIONAL FFT 


CONVOLUTION (CORRELATION ) 
DIFFERENCE EQUATION, 2 POLES, 
VECTOR POLYNOMIAL EVALUATION 


2 ZEROS 


VECTOR SUM OF ELEMENTS INTEGRATION 
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Typical 
Execution 
Time /Loop 

(us ) 
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Summary of AP FORTRAN Callable Routines (cont.) 


Program 
Size 
(AP 
PS words) 
167 | 333 
18 22 
59 59 
27 27° 
160 160 
216.222 
30 30 
39 39 
37 37 
61 
30 
186 184 
189 189 
233: 251 
Zoe. 252 
42 42 
59 59 
274 274 
585 585 
106 106 
25 25 
4l 41 
13 13 


Table 1-3 Summary of AP FORTRAN Callable Routines (cont.-) 


Name 


VTRAPZ 
VSIMPS 


Operation 


VECTOR TRAPEZOIDAL RULE INTEGRATION 
VECTOR SIMPSONS 1/3 RULE INTEGRATION 


WIENER WIENER LEVINSON ALGORITHM 


Typical 


Execution 
Time /Loop 


(us ) 


0 GD ED GD GE OE) ES GED GD ED CED GED EE CES GD OED aa GED aN Ge 00 A ee DE EE GED SND SY GND OND GED GME GD CEE GD ND GND GND GENE SND SHEED CD SED ED ED ED GERD ED GRU ED GEN END UD Gills ENE GD ES cu wine Ge GED SED cS 


- SIGNAL PROCESS ING OPERATIONS (optional) 


HIST 
HANN 
ASPEC 
CSPEC 
VAVLIN 
VAVEXP 
VDBPWR 
TRANS. 
COHER 
ACORT 
ACORF 
CCORT 
CCORF 
TCONV 
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HISTOGRAM 

HANNING WINDOW MULTIPLY . 
ACCUMULATING AUTO-SPECTRUM 
ACCUMULATING CROSS-SPECTRUM 

VECTOR LINEAR AVERAGING 

VECTOR EXPONENTIAL AVERAGING 

VECTOR CONVERSION TO DB (POWER) 
TRANSFER FUNCTION 

COHERENCE FUNCTION 

AUTO-CORRELATION (TIME-DOMAIN ) 
AUTO-CORRELATION (FREQUENCY-DOMAIN ) 
CROSS-CORRELATION (TIME-DOMAIN) 
CROSS=CORRELATION (FREQUENCY—DOMAIN ) 
POSTTAPERED CONVOLUTION (CORRELATION ) 


s * . 


* 


OF Wr rR rH Fr OF 
eo. «# 
NOW WWwN Oo 


0.29* 0.29 
2-58* 3.93 
0.30* 0.30 


Program 
Size 
(AP 
PS words) 
167 | 333 
16 16 
25 25 
100 100 
‘7A 71 
41 41 
21 22: 
39 40 
54 46 
55 46 
75 75 
100 100 
109 114 
IZ -AZL 
501 489 
L2tt 2k 
526 510 
big~ ele 


Table 1-3 Summary of AP FORTRAN Callable Routines (cont.) 


Typical Program 
Execution Size 
Name Operation Time/Loop (AP 
(us ) PS words) 


167 | 333 167 | 333 


TABLE MEMORY OPERATIONS (optional) 


MITMOV. VECTOR MOVE (MD TO TM) 0.2 0.3 6. 7 
TMMOV VECTOR MOVE (TM TO MD) 0.2 0.3 5 5 
MTIMOV VECTOR MOVE WITH INCREMENT (MD TO TM) 0.5 0.5 7 7 
TMIMOV VECTOR MOVE WITH INCREMENT (TM TO MD) 0.3 0.3 15 15 
TTIMOV VECTOR MOVE WITH INCREMENT (TM TO TM) 0.5 0.5 7 7 
MMTADD VECTOR ADD (MD+MD TO T™) 0.7 0.8 20 13 
MMTSUB VECTOR SUBTRACT (MD=MD TO TM) 0.7 0.8 20 13 
MMTMUL VECTOR MULTIPLY (MD*MD TO TM) 0.7 0.8 20 13 
MIMADD VECTOR ADD. (MD+TM TO MD) 0.5 0.8 20°. 
MTMSUB VECTOR SUBTRACT (MD=-IM TO MD) 0.5 0.8 20 9 
TMMSUB VECTOR-SUBTRACT (TM=MD TO MD) 0.5 0.8 20 9 
MITMMUL VECTOR MULTIPLY (MD*ITM TO MD) 0.5 0.8 20 9 
MTTADD VECTOR ADD (MD+IM TO TM) 0.5 0.5 20 20 
MTTSUB VECTOR SUBTRACT (MD-TM TO TM) 0.5 0.5 20 20 
TMTSUB VECTOR SUBTRACT (TM-MD TO TM) 0.5 0.5 20 20 
MITMUL VECTOR MULTIPLY (MD*ITM TO TM) » 0.5 0.5 20 20 
TTMADD VECTOR ADD (ITM+IM TO MD) 0.5 0.5 20 20 
TTMSUB VECTOR SUBTRACT (TM-TM TO MD) 0.5 0.5 20 20 
TIMMUL VECTOR MULTIPLY (TM*IM TO MD) 0.5 0.5 20 20 
TTTADD VECTOR ADD (TM+IM TO TM) 0.7 0.7 9 2 
TTTSUB VECTOR SUBTRACT (TM-TM TO TM) 0.7 0.7 9 9 
TTTMUL VECTOR MULTIPLY (TM*TM TO TM) 0.7 0.7 10 10 
FPS 860-7259-003 1 = 20 


Table 1-3 


Name 


SPFLT 
SPUFLT 
SPNEG 
SPADD 
SPSUB 
SPMUL 
SPDIV 
SPRS 
SPLS 
SPAND 
SPOR 
SPNOT 
SAVESP 
SAVSPO 
SETSP 
SET 2SP 
MDCOM 
ZMD 
RDC5 
SETCS” 
DAREAD 
DAWRIT 
VFCL1 
VFCL2 
BITREV 
REALTR 
FFT2 
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_ LOAD 


Operation 


SCALAR DIVIDE 

SCALAR SQUARE ROOT 

SCALAR LOGARITHM (BASE 10) 
SCALAR NATURAL LOGARITHM 

SCALAR EXPONENTIAL 

SINE 

SCALAR COSINE 

SCALAR ARCTANGENT 

SCALAR ARCTANGENT OF Y/X 

FLOAT S-PAD INTEGER 

S-PAD UNSIGNED FLOAT 

S=PAD NEGATE 

S-PAD ADD 

S-PAD SUBTRACT 

S-PAD MULTIPLY 

S-PAD DIVIDE 

S-PAD RIGHT SHIFT 

S=-PAD LEFT SHIFT 

S=-PAD AND 

S=-PAD OR 

S=PAD NOT 

SAVE S-PAD INTO PROGRAM MEMORY 
SAVE S-PAD O INTO PROGRAM MEMORY 
S-PADS FROM PROGRAM MEMORY 
LOAD 2 S=PADS FROM PROGRAM MEMORY 
MAIN DATA COMPARE AND SET S=PAD 
CLEAR ALL PAGES OF MAIN DATA MEMORY 


READ CONTROL BIT 5 INTERRUPT 


SET CONTROL BIT 5 INTERRUPT 

READ DEVICE ADDRESS REGISTER 

WRITE DEVICE ADDRESS REGISTER 
VECTOR FUNCTION CALLER (1 ARGUMENT ) 
VECTOR FUNCTION CALLER (2 ARGUMENT ) 
COMPLEX VECTOR BIT REVERSE ORDERING 
REAL FFT UNRAVEL AND FINAL PASS 
RADIX 2 FFT FIRST PASS 


Typical 
Execution 
Time /Loop 

(us ) 


167 | 333 
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Summary of AP FORTRAN Callable Routines (cont.) 


Program 
Size 
(AP 
PS words) 
167 | 333 
28 28 
28 28 
37 37 
37 37 
28 28 
35 35 
35 35 
74 74 
74 74 
5 5 
8 8 
2 Z 
dL; 1 
1 1 
14 14 
43 43 
5 5 
5 5 
1 1 
l 1 
1 L 
18 18 
H ll 
33 33 
33 33 
ll ll 
29 29 
9 9 
1 1 
Zz 2 
2 Z 
10 10 
ll ll 
45 43 
68 68 
16 16 


Table 1-3 Summary of AP FORTRAN Callable Routines (cont.) 


Typical Program 

Execution | Size 

Name Operation Time/Loop (AP 
(us ) PS words) 


Lo? |-.333 167 | 333 


FFT4 RADIX 4 FFT PASS 3-7 5-3 79 79 
FFT2B RADIX 2 FFT FIRST PASS + BIT REVERSE 1.3 2.7 25 2 
FFT4B RADIX 4 FFT FIRST PASS + BIT REVERSE 207 523 43 43 
STSTAT SET FFT MODE STATUS BITS 5.0 @ 5.0 19 19 
CLSTAT CLEAR FFT MODE STATUS BITS 0.5 @ 0.5 19 19 
ILOG2 LOGARITHM (BASE 2) 4.90 @ 4.0 19 19 
ADV2 ADVANCE POINTERS AFTER RADIX 2 FFT 0.7 @ 0.7 7 7 
ADV4 ADVANCE POINTERS AFTER RADIX 4 FFT 0.7 @ 0.7 Go OF 
SET24B SETUP FOR FFT2B AND FFT4B 1.2 @ 1.2 8 8 
XCFFT EXPANDED COMPLEX FFT 0.32% 0.42 187 187 
XRFFT EXPANDED REAL FFT 0.19* 0.28 256 256 
XBITRE EXPANDED BIT REVERSE 3.7 3.7 44 44 
XREALT EXPANDED REAL FFT FINAL PASS 0.4 0.7 va 71 
PCFFT PARTIAL COMPLEX FFT 1.05* 1.50 LLy:. Le 
XFFT4 EXPANDED RADIX 4 FFT PASS 3.7 5-3 79 io 
CTOR COMPLEX TO REAL FFT UNSCRAMBLE 0.13* 0.13 80 80 
RTOC REAL TO COMPLEX FFT SCRAMBLE 0.19% 0.09 143 143 
SSDA SINGLE + SINGLE TO DOUBLE ADD 1.5 @ 1.5 10 10 
SSDM SINGLE * SINGLE TO DOUBLE MULTIPLY 11.5 @11.5 81 81 
SDDA SINGLE + DOUBLE TO DOUBLE ADD 4.5 @ 4.5 28 28 
DDDA DOUBLE + DOUBLE TO DOUBLE ADD . 7-5 @ 7.5 48 48 
DDDM DOUBLE * DOUBLE TO DOUBLE MULTIPLY 18.5 @18.5 117) «6117 


NOTE 
#.# Timing host system dependent 


* Refer to description of routine for explanation of timing 
@ Total execution time 
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Table 1-4 Convolution (Correlation) 


TYPICAL EXECUTION 


ELEMENT COUNTS TIME/LOQP (us) 
cS) 128 0.28 3.28 
32 128 0.83 0.83 
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Table 1-5 Fast Fourier Transforms 


a SD 


64 0.18 0.27 0.14 - 0.20 . 0.28 0.20 


256 6.74 1.13 . 0,58 9.90 


512 1.50 i a 1.20 1.76, 


1024 3.30 - 5,08 - 2.70 4,18 foes | : 4.75 


4096 14.95 22.96 12.56 19.37 22.66 


16384 67.19 102.70 98.36 124.70 183.27 105.58 


32768 138.42 205.35 | 119.30 176.68 po | -- ee 
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CHAPTER 2 


FUNCTIONAL DESCRIPTION 


2.1 INTRODUCTION 
The hardware of the AP is composed of the following three types of 
functional elements: 
e logical and control elements 
control unit 
s-pad unit 
e floating-point arithmetic elements 
floating-point adder 
floating-point multiplier 
® memory elements 
data pad unit 


main data memory unit 
table memory unit 


Each of these functional units is independent and thus can 
independently perform the programmed operations for which it was 
designed in parallel with the other functional units. 
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2.2 CONTROL UNIT 


The control unit, as illustrated by Figure 2-1, consists of: 


@ program source memory (PS) 
@ program source address (PSA) register 
e control buffer (CB) with decoding logic 


e subroutine return stack (SRS) 


The operation of the AP is controlled by the execution of 64-bit 
instruction words which reside in program source (PS) memory. The 
program word for the next instruction to be performed is selected by 
the address in the program source address (PSA) register. At the 
initiation of the next machine cycle, this program word is transferred 
to the control buffer (CB) where it is decoded and executed. The PSA 
is incremented by one unless a branch in the current instruction causes 
the PSA to move to another location in program source memory. Access 
to program source memory and instruction decoding is overlapped so that 
the AP can operate at a 6-MHz rate (167ns). 


Branching is accomplished in two ways. A short-range branch is 
provided by adding the 5—bit branch displacement field to the current 
PSA- This gives a branch range of from -20g to +17g- A long-range 
jump to any location in PS is accomplished by loading the desired 
target address into PSA. 


Subroutine jumps are made by a JSR instruction which saves the current 
PSA in the subroutine return stack and sets PSA to the subroutine 
address. Return is via a return, which loads the PSA with the last 
entered return address on the SRS. 


Subroutine return address (SRA) is the subroutine return stack pointer, 


which is automatically incremented or decremented as subroutines are 
called and returns are made from the subroutine. 
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SQA 


SUBROUTINE 


QETURN STACK 


PROGRAM SOURCE ADDRESS (PSA) 


PROGRAM SOURCE 
MEMORY (PS) 


CONTROL 3UFFER (CB) 


4987 


Figure 2-1 Control Unit 


2-3 S=-PAD UNIT 


This unit, illustrated by Figure 2-2, performs the integer address 
indexing, loop counting, and control functions necessary to direct 
completion of a given algorithm. In form, it is similar to familiar 
minicomputers such as the PDP=-11. and Nova. 


The s=-pad contains sixteen 16-bit directly-addressable registers. The 
contents of these registers pass through a special integer ALU 
associated with this unit. . 


The output of the ALU may be directed back to the specified s-pad 
destination register and also may be directed to any of the following 
address memory registers: memory address (MA), table memory address 
(TMA), or data pad address (DPA). | 
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The s-pad integer ALU functions include the Following: 


function effect 

move S—*D S-source register 
logical complement sS-—~>D D-destination register 
clear . Oo-—>D 

increment S+1 —?> D 

decrement S-l1—>D 

add D+S —> D. 

subtract D-S —? D 

logical AND | D AND S—>D 

logical OR D OR S—>D 

logical equivalence D EQV S——>D 


The output of the s-pad ALU (called S-PAD FUNCTION or SPFN) may be used 
unmodified, shifted left once, shifted right once, or shifted right 
twice. 


A hardware bit-reverse function included in the s-pad accomplishes the 
bit swapping necessary to access data in scrambled order after an FFT. 


The s-pad ALU also sets three condition bits in the AP status register 
depending upon the output of the ALU/shifter: 

N: set if result <0; cleared otherwise 

Z: set if result =0; cleared otherwise 


C: set if a carry occurred; cleared otherwise 


These bits may be tested by the next AP instruction, and a branch made, 
depending upon whether the specified condition is true. 
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DATA PAD 3US {D8} 


S=PAD 
REGISTER FILE 


BIT 
REVERSE | 


$-PAD 
ALU 
SHIFTER 


JATA PAD ADDRESS (DPA) REGISTER 


MEMORY ADDRESS (MA ) REGISTER 


TABLE MEMORY ADDRESS (TMA) REGISTER 


Figure 2-2 S-Pad Unit 
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2-4 FLOATING-POINT ADDER UNIT 


The floating-point adder, shown in Figure 2-3, performs addition or 
subtraction operations on the contents of the adder input registers (Al 
and A2). The operation is performed in two stages, each of which takes 
one machine cycle. 


In the first stage, the exponents of the two numbers are compared and 
the fractions are aligned by shifting the fraction of the smaller 
number right. The fractions are then added or subtracted. In the 
second stage, the resultant fraction is normalized and convergently 
rounded. 


Since the two stages are independent of each other, a new pair of 
numbers can be entered into Al and A2 every AP cycle (167ns). The 
result is available for use two cycles later (333ns). 


In effect, the floating adder (FA) is a pipeline where new inputs can 
be entered into the pipeline stream every cycle. Initiation of an add 
operation loads the two numbers to be added into the Al and A2 input 
registers. The previous adder input is pushed down the pipeline to the 
adder buffer register. One cycle later, the completed result (called 
FA) from the buffer is available for storage or use by another unit. 
Thus, a new add can be started every 167ns, and the result is ready 
333ns later. 


Al may be loaded from data pad (DP), from the output of the floating 
multiplier (FM), or from table memory (TM). A2 may be loaded from data 
pad (DP), from the output of the floating adder (FA), or from main data 
memory (MD). 


The output of the floating adder (FA) may be directed to the multiplier 
(M2), to the adder (A2), to data pad (DP), or to memory input (MI). 
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The operations performed by the floating adder are: 


e Al+A2 
e Al—A2 
e A2-Al — 


e Al EQV A2 

e Al AND A2 

e AL OR A2 

°. convert A2 from signed magnitude to 2’s complement format 
@® convert ae 2°s cu aetenene tecelened magnitude format 
® scale A2 | | | 

e absolute vatuaree Ad 


e fix A2 
Four condition bits in the AP status register are set or cleared by the 
floating adder depending upon the current result; 


FZ - Set to one if result is zero, else 
cleared to zero. 


FN Set to one if result is negative, else 
- Cleared to ZerO~ 


FO Set to one if exponent overflow occurred. The 
result is forced to the signed maximum value. 


FU Set to one if exponent underflow occurred. 
The result is forced to zero. 
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The overflow and underflow bits remain set until cleared by the 
programe These bits may be tested by the instruction after the 
floating adder result is completed (i.e., three cycles after the 
floating adder operation is initiated). 


E20 OPX Py 7 FM FA SPX PY Mp) ZERO 


ALIGN FRACTIONS AND ADD. STAGE 1 


BUFFER 


NORMALIZE AND ROUND STAGE 2 


FA 


M2 A2 MI 3PX oPY 


9989 


Figure 2-3 Floating-Point Adder Unit 
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2-5 FLOATING-POINT MULTIPLIER UNIT 


The floating multiplier, as illustrated in Figure 2-4, forms the 
product of the two multiplier input registers (Ml and M2). The product 
is formed in three stages, each of which takes one machine cycle. 


In the first stage, the 56-bit product of the two 28-bit fractions are 
partially completed. The second stage completes the product of the 
fractions. In the third and final stage, the exponents are added, and 
the mantissa product is normalized and convergently rounded. 


The floating multiplier, like the floating adder, is organized like a 
pipeline. Initiation of a multiply loads the two numbers to be 
multiplied into the Ml and M2 input registers. The two previous 
multiplier inputs are pushed down the pipeline to buffer 2 and buffer 
3, respectively. One cycle later, the result from buffer 3 is 
available for storage or use by another unit. 


Thus, a new product can be ‘started every 167ns, and che result is ready 
500ns later. 


Ml can be vegded from data pad (DPX or DPY), from the output of the 
floating multiplier (FM), or from table memory (IM). M2 is loaded from 
data pad (DPX or DPY), from the adder cae om the multiplier (Ml), 
or from the main data memory (MD). 


Two error bits in the AP status peetecee are affected by the floating 
multiplier: 


FO Set if exponent overflow occurred. The result 
is forced to the signed maximum value. 


FU Set if exponent underflow occurred. The result 
is forced to zero. 
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oPX oPY ™ EM FA DPX SPY 49 


.ADD EXPONENTS START PRODUCT OF FRACTIONS 


BUFFER 2 | 


STAGE 1 


COMPLETE PRODUCT OF FRACTIONS 


BUFFER 3 


NORMALIZE AND ROUND STAGE 3 


STAGE 2 


(FM) 


Mi Al MI DPX OPY 


3999 


Figure 2-4 Floating Multiplier 
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2-6 DATA PAD UNIT 


Data pad, illustrated in Figure 2-5, consists of two fast accumulator 
blocks (each with 32 floating-point locations) called data pad X (DPX) 
and data pad Y (DPY). In a single-machine cycle, the contents of one 
location from each data pad can be read out and used. In addition, 
data can also be stored into one location in each data pad in the same 
cycle. For example, in a single instruction (l16/7ns), a multiply can be 
initiated specifying one argument from DPX and another from DPY; an 
adder result (FA) can be stored into a DPX location, and a data element 
in main data stored into a DPY location. On the very next instruction, 
similar multiple data pad accessing could be accomplished again. 


The two memories are addressed via a combination of the data pad 
address (DPA) register and four index field values contained in a given 
instruction word. DPA can be thought of as a base address register or 
stack pointer. It can be loaded from the s-pad (SPFN) or its contents 
can be incremented or decremented by one. 


For a given read or write operation (i.e., reading from data pad X) an 
index value contained in the instruction is added to the current 
contents of DPA to give the effective address for that particular 
operation. The four index fields (one each for read DPX, read DPY, 
write DPX, and write DPY) are each three bits wide and have a range 
from -4 to +3 relative to DPA. . 


Data from either data pad can be used by the multiplier (Ml, M2), adder 
(Al, A2), or memory input (MI). Data can be stored into data pad from 
the adder (FA), multiplier (FM), s-pad function output (SPFN), command 
buffer value (VALUE), or from data pad (DP). 
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(DATA PAD 3US = 2B) 


ARITE INDEX WRITE INDEX 
DPX DPA OPY 
READ INDEX READ INDEX 
(DPX) (OPY) 
M1 M2 Al 42 OB Ml M2 Al A2 o8 
3391 


Figure 2-5 Data Pad 
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2-7 DATA MEMORY UNIT 


The data memory unit, as illustrated in Figure 2-6, is the primary data 
store for the AP. It is available in 38-bit wide 8K modules which have 
an interleaved cycle time of 333ns (for the standard memory) and 167ns 
(for the fast memory). 


The memory unit contains a main data memory (MD) buffer and a memory 
input (MI) buffer. Data read from memory is placed by the controller 
into MD, while data is written into memory from the MI. The memory 
address (MA) register points to the desired memory location. 


In referencing memory for read or write operations, the selected 
operation is initiated by making a change to the memory address (MA) 
register. The MA register can be loaded from the s-pad (SPFN) or its 
contents incremented or decremented by one. 


A write operation is specified by loading MI with the data to be 
written during the same instruction in which MA is changed. This data 
is then written into memory from MI during the next two AP cycles. 
Data can be loaded into MI from the floating adder (FA), floating 
multiplier (FM), data pad (DP), main data memory (MD), table memory 
(TM), the input bus (INBS), s-pad function (SPFN), or the command 
buffer value (VALUE). A memory operation can be initiated every other 
cycle. The intervening cycle can be used for any other AP function 
except another memory initiate. ~ 


When a memory read is initiated, the requested memory data is placed by 
the memory controller into the main data memory (MD) register three 
cycles after the reqest is made- Two instructions after the read 
request, another memory operation can be initiated. Again, the 
intervening cycle can be used for any non-memory function. Data in MD 
can be used by the floating adder (A2), floating multiplier (M2), or 
data pad (DP). 


To optimize the operation of the AP, it is necessary for the programmer 
to look ahead and initiate memory reads prior to the actual time that 
arguments from data memory are used in a calculation. 


The system provides a memory lock-out which serves to ensure that 
erroneous reads and writes of memory do not occur.e If a memory 
initiate occurs while memory is busy, further program execution is 
halted until the previous memory cycle is completed. 
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INBS VALUE DPX OPY MD SPFN 7M 


(DATA PAD BUS = 5B) 


MA MAIN DATA MEMORY - 


0B A2 M2 


9992 


Figure 2-6 Data Memory Unit 
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2-8 TABLE MEMORY UNIT 


The repeated use of standard constants (such as complex roots of unity 
and transcendental values) in signal processing routines dictates their 
ready availability to the programmer. A separate table memory, as 
illustrated in Figure 2-7, eliminates memory accessing conflicts by 
allowing data values (constants) to be placed in separate memory banks. 


Values read from table memory are placed by the controller into the 
table memory buffer register. The table memory address (TMA) register 
serves as a pointer to the desired location. 


A table memory read is initiated by changing the contents of TMA either 
by loading a value from the s-pad (SPFN) or by incrementing or 
decrementing the contents of TMA. 


A new table value may be requested every machine cycle. This value is 
available for use two cycles later. The value can be used by the 
floating adder (Al), floating multiplier (Ml), or data pad (DP). 


In FFT mode (iee., when FFT is being computed), the address in TMA is 
interpreted by the hardware to be an angle which points to the 
appropriate root of unity for a particular step in the algorithm. This 
allows the full table of roots of unity to be compressed into a single 
quadrant of cosines. 


Refer to Programmer’s Reference Manual Part One (FPS 860-7319-000) for 
information on TMRAM. 
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{OBS 


om 


TABLE MEMORY 20M TMA TABLE MEMORY RAM 


Al M1 DB 


0993 


Figure 2-7 Table Memory 
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2-9 INTERNAL FLOATING-POINT FORMAT 


Floating-point data internal to the AP is represented as follows: 


EXPONENT MANTISSA 


2 9 19 oA 
#1) £9 Mp M27 
9994 
where: 
mantissa 28-bit 2’s complement fraction 
exponent 10-bit binary exponent, biased by 512 


The value of a floating-point number in this format is defined as: 


Mantissa * 2 (exponent -512) 


The dynamic pangs Of this format is from 0.5 * g7312 to (1-278 a2 S11, 
or from 3-7*1071>> to 6.7*10!>%. 


The 28-bit fraction, combined with the convergent rounding algorithm 
used in the floating adder and multiplier, gives a maximum relative 
error of 7.5*1079 per arithmetic operation. This is a precision of 8.1 
decimal digits. As a comparison, unrounded IBM 360 format gives only 
6-0 decimal digits of arithmetic accuracy. 


The convergent rounding hardware rounds up when the magnitude of the 
remainder is greater than one-half of the least significant bit of the 
mantissa. This serves to minimize truncation errors in long series of 
arithmetic calculations. 


Format conversion between host format and AP format occurs in the 
interface and in the floating adder unit. The dynamic range of the 
internal format is large enough to accommodate IBM 360 format and other 
host formats. The extended precision of the AP internal format ensures 
that accuracy is maintained during critical stages of data analysis. 
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CHAPTER 3 


PROGRAMMING CONSIDERATIONS 


3-1 INTRODUCTION 


This chapter provides an introduction to programming the AP. The 
principal operations which control each of the six functional units are 
described below. A complete listing of the AP instruction word fields 
can be found in Appendix B. 


In the coding examples, a semi-colon (;) is used to separate 
operations within a complete instruction word. A comma (,) separates 
operands. A quote mark ("') is used to denote a comment. A less than 


sign (<) is used to mean "«—" (replaced by) where the operation 
involved is a data transfer. 


3-2 FLOATING-POINT ADDER 


The following sections describe the floating-point adder. 
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3.2.1 FLOATING ADDER OPERATIONS 


Floating adder operations are initiated by the following instructions: 


instruction operands 
FADD Al, A2 
FSUB Al,A2 
FSUBR Al,A2 
FAND Al,A2 
FOR Al,A2 
FEQV Al,A2 
FABS A2 
FIX A2 
FSM2C A2 
F2CSM A2 
FSCALE A2 


operations initiated 


Al+A2 
Al-A2 
A2-Al 

Al AND A2 
Al OR A2 
Al EQV A2 
ABS (A2) 


Convert A2, floating-point 
number to fixed integer. 


Convert A2, signed magnitude 
to 2’°s complement. 


Convert A2, 2°s complement to 
signed magnitude. 


Scale A2. 


where Al and A2 are any of the following data sources: 


Al: FM floating multiplier result 
DPX data pad X accumulator 
DPY data pad Y accumulator 
™ last data read from table memory 
ZERO floating-point zero 
A2: FA floating adder result 
DPX 
DPY 
MD last data read from data memory 
ZERO 
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Any data source listed under Al may be combined with any data source 
listed under A2. For example, to add a number from data pad X to 
another from data pad Y: 


FADD DPX, DPY "DPX+DPY 
To subtract a number read out of data memory from a constant in table 
memory: 


FSUB TM,MD "TM=MD 


A reverse subtract changes the order of the subtraction; i-e., 


FSUBR TM,MD "MD-TM 
subtracts a constant from table memory from a number in data memory. 


To negate a number from DPX: 


FSUB ZERO, DPX "0.0 - DPX = -—DPX 


To take the absolute value of a number from data memory: 


FABS MD "ABS (MD) 


To fix (convert from floating-point to integer format) a number from 
DPY< 


FIX DPY "FIX (DPY) 
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3.2.2 ADDER PIPELINE 

The floating adder is a two-stage pipeline. A FADD instruction loads 
the designated operands into the Al and A2 registers. The previous 
contents of Al and A2 are pushed down the pipeline to the buffer 
register. One AP cycle later, the new contents of the buffer have been 
normalized and rounded and are then available for use or storage 
elsewhere. 


Example 1 illustrates how the adder pipeline works, where A,B,...,G,H 
are floating-point numbers to be added. 


Example 1 


ADDER PIPELINE 
TIME [a1,a2 BUFFER | RESULT (FA) 


0995 
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The FADD without arguments in cycle 5 is used only to push the last 
computation into the buffer register and hence to the end of the 
pipeline. Thus, it is a dummy add because the results are unimportant 
and are never used. In Example 1, the floating-point adds are 
completed in one microsecond. During cycles 2 through 4, when the 
pipeline is full, adds are done every 167ns, the maximum rate. The 
completed results as they come out of the adder pipeline are referred 
to by the mnemonic FA. FA is dynamic in the sense that it must be used 
or stored elsewhere before being changed by the next floating adder 
instruction. The programmer, however, has complete control over the 
pipeline. Arguments advance only when pushed through the pipeline by 
floating adder instructions. 


322-3 AN EXAMPLE 
A complete computational sequence to do the vector sum A;j=A; +B;, 


i=0,1,2,3, is shown in Example 2. A; is stored in data pad X locations 
0-3, and By is stored in data pad Y location 0 through 3. 


Example 2 


1. FADD DPX(0),0PY(0) "Do AytB, 
2. FADD DPX(1),DPY(1) "Do Art 8; 
- 3, FADD DPX(2),DPY(2); OPX(0)<FA "Do AptBy, AgtBg is now 


done, save it in Ag 


4. FADD OPX(3),DPY(3); DPX(1)<FA "Do A3+B3, A,+B, iS now 
done, save it in A; 


5. FADD; OPX(2)<FA "Push Adder; save A2+B. in Az 


6. OPX(3)<FA "Save A3t+ Bz in Ag 
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Example 3 is a chart of this computation showing the state of the adder 
pipeline and data pad after each instruction is executed. A 


Example 3 


ADDER PINPELINE DATA PAS X 
art aire | ADDER ee ee Oe tee 
CYCLE Al,A2 BUFFER RESULT | 9 1 2 3 | 


m 


3-2-4 FLOATING ADDER TESTS 


Table 3-1 lists the conditional branches that test the floating adder 
result (FA): 


Table 3-1 Floating Adder Tests 


BR LOOP “Branch unconditionally to program 
“Tocation "LOOP" 
BFEQ LOOP “Branch if FA=0.0 
BFNE LOOP “Branch if FA%Q.0 
BFGE LOOP “Branch if FA>0.9 
BFGT LOOP "Branch if FA>0.0 
1059 
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The branches test FA one instruction cycle after it is ready for use. 
That is, an adder result may be tested one cycle after it comes out of 
the adder pipeline. This is shown in Example 4. 


Example 4 
1. FSUB DPX,DPY "So a computation 
2. FADD “Push the result out 
3. DPX<FA "Save the result 
4. BFEQ LOOP “Teast the result here (branch to 


" lgcation "LOOP" if result was 
“ zero) 


0998 


Compound tests may also be made. Test MD to see if it is between a 
lower Limit contained in DPX (1) and an upper limit in DPX (2) (i-e., 
see if DPX(1)<MD<DPX(2)). This is shown in Example 5. 


Example 5 
1. FSUBR DPX(2),MD “Do MD-DPX(2)} 
2. FSUB DPX(1),MD “Do OPX(1)-MD 
3. FADD "Push first test result out 
4. B8BFGT BIG "Was too big 
5. BFGT SMALL “Was too small 
Bi. ward "OK 


9999 


The branches are made relative to the current program source address 
(PSA) with a 5=bit displacement value. This means that the conditional 
branch target address must be within -20g to +17g locations of the 
current instruction. 


FPS 860-7259-003 : Sa: 


3.2.5 FLOATING-POINT LOGICAL OPERATIONS 

Instructions FAND, FOR, and FEQV perform logical operations on 
floating-point numbers. Exponent alignment occurs as for a normal 
floating-point add. The two mantissas are then combined using the 


specified logical operation. The result is then normalized and 
rounded. 


3-3 FLOATING-POINT MULTIPLIER 


The following sections describe the floating-point multiplier. 


3-3-1 MULTIPLY INSTRUCTION 
Floating-point multiplies are initiated by the following instruction: 
FMUL M1,M2 


which initiates a multiply between Ml and M2, where Ml and M2 are any 
of the following data sources: 


Ml FM floating multiplier result 
DPX data pad X accumulator 
DPY data pad Y accumulator 
™ last data read from table memory 
M2 FA floating adder result 
DPX 
DPY 
_ MD last data read from data memory 


Thus, any of the data sources listed under Ml can be multiplied by any 
of the data sources in M2. For example, to multiply a number read from 
data memory by a constant from table memory: 

FMUL TM,MD "T™ * MD 


or, to multiply a number in data pad X by another number in data pad Y: 


FMUL DPX,DPY "DPX * DPY 


FPS 860-7259-003 3 - 8 


3.3.2 MULTIPLIER PIPELINE 

The floating multiplier is a three-stage pipeline. An FMUL instruction 
loads the specified operands into the Ml and M2 registers. The two 
previous partially-completed products are pushed down the pipeline to 
buffer 2 and buffer 3, respectively. One AP cycle later, the new 
contents of buffer 3 have been normalized and rounded and are then 
available for use or storage elsewhere. 

The instruction sequence shown in Example 6 illustrates how the 


multiplier pipeline works where A,B,...,G,H are floating-point numbers 
to be multiplied together. 


Example 6 


MULTIPLIER PIPELINE 
MULTIPLIER 
TIME CYCLE INSTRUCTION M1 ,M2 BUFFER 2. BUFFER 3 RESULT (FM) 


ee on oe 


1000 


The FMUL in cycles 5 and 6 are dummy multiplies used to push the last 
two computations to the end of the pipeline. In Example 6, four 
floating-point multiplies in 1.0Ous are completed. During cycles 3 and 
4, while the pipeline is full, products are done every 167ns, the 
maximum rate. 


The completed products as they come out of the multiplier pipeline are 


referred to by the mnemonic FM. FM is dynamic in that it must be used 
or stored before being changed by the next FMUL instruction. 
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323-3 AN EXAMPLE 


A computation example to square the elements in a vector is shown in 
Example 7. 


Example 7 


A = A;*A,, 120,1,2,3. A; is stored in Data Pad X. 


2 
1. FMUL DPX(Q),DPSX(0) "Do Ay 
2. FMUL DPX(1),DPX(1) "Do Ay 
2 
3. FMUL OPX(2) .DPX(2) "Do As 
2 z 
4. FMUL DPX(3),DPX(3); DPX(0)<FM "Do Az, save Ap 
3. FMUL: DPX(1)<FM "Save Ay 
> 
6. FMUL: DPX(2)<FM "Save A> 
2 
7. OPX(3)<FM "Save Aa 


1001 
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Example 8 illustrates this computation showing the state of the 
multiplier pipeline and data pad X after each instruction is executed. 


Example 8 


MULTIPLIER PIPELINE DATA PAD X 
a es Toa | MULTIPLIER 
CYCLE [M1 M2 BUFFER 2 BUFFER E35 RESULT (FM) 3 


1002 
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3-3-4 MULTIPLY-ADDS 


The full floating-point computational power of the AP is utilized when 
a process involving both multiplies and adds is considered. The dot 
product of two eight-element vectors A, eB, = ZA,B., i = -4, -3,...,1 


2, 3, where A, is in Data Pad X and By is in Data Pad Y, is formed in 
Example 9. 


> 


Fill the 1. FMUL OPX(-4) ,OPY(-4) "Do Aq--Be- 
Multiplier 2. FMUL OPX(-3) ,DPY(-3) “Do A-2B-: 
Example 9 Pipeline 3. FMUL DPX(-2) ,DPY(-2) "Oo A-2B-- 
4. FMUL DOPX(-1),DPY(-1); "Do A-1B-:. AeuB-u is 
FADD FM,ZERO “now done, save it in 
Fill the "adder. 
Adder 5. FMUL DPX(0),DPY(Q); "Do ApBg. A-3B-3 is now 
Pipeline FACD FM,ZERO "done, save it in the 
"adder. 
6. FMUL OPX(1),DPY(1); "Do A,B;. A-.B-5 is now 
FADD FM,FA "coming out of the multiplier, 
"and A-yB-ufrom the adder. add 
Both "them together. 
Pipelines 7. FMUL OPX(2),DPY(2) ,OPY(2); "Do A2B2. A=18-, is now coming 
full FADD FM,FA "out of the multiplier, and 
" A-3B-3 from the adder, add 
. ‘them together. 
8. FMUL DPX(3),0PY(3); "Do A3B3. AgBo is now coming 
FADD FM,FA "out of the multiplier, and 
" (A-yB-y + A-2B-2) from the 
"adder, add them together. 
9. FMUL; FADD FM,FA "A,B, is coming out of the 
"multiplier, and (A-3B-3 
“  +A=)B-1) from the adder, 
Empty the "add them together. 
Multiplier 10. FMUL; FADD FM,FA "A2Bz2 is coming out of the 
Pipeline "multiplier, and (A-,8-:, 
" +A=9Beo+AgBy) from the 
"adder, add them together. 
11. FADD FM,FA "A3B3 is coming out of the 
“multiplier, and 
" (A=3B=3 +A=) B=; +A; By ) 
“from the adder, add 
“them together. 
12. FADD; DPX(3)<FA " (Any Boy +A-2 B=> +p By +Ay Bo ) 
"is coming out of the 
Empty “adder, save it in DPX(3). 
the Adder 13. FADD DPX(3),FA "(Az3 Be3 +A) Be, +A, By +A, B; ) 
Pipeline “ts coming out of the 
"adder, add it to 
"(Ary Bey, +A=9B=2 +Ay By +Ay By ) 
"which was saved in OPX(3). 
14, FADD “Push result out of Adder 
15. DPX(3)<FA "The result: (A-,,B-,+ 
" A=3B-3+A-5B-9+A-)B-1+ 


. AoBotA1B,+A2B+AzB2), 
"Saved in DPX(3). 
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In accumulating the sum-of-products, the even term sum is kept in 
one-half of the adder pipeline and the odd term sum in the other half. 
During cycles 5 through 7 when both pipelines are full, floating-point 
multiply adds are computed every 167ns. This is 12 million 
floating-point computations per second. A longer sum of products 
calculation involving more terms would maintain this maximum 
computation rate, because nearly all of the time was spent filling and 
emptying pipelines. Even so, the seven adds and eight multiplies take 
15 cycles (2.5us) to complete (an overall rate of 333ns per 
floating-point multiply add). 


Example 10 summarizes the computation as a further aid in understanding 
the multiply add interaction in the sum-of-products computation of 
Example 9. 
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Example 10 


MULTIPLIER 


CYCLE im, M2 ; 3 


2 es a ee = 
ss pee fe fe fe = 


i {= — 


ss oa ie. Sey Pay = 


ur 
. 


2 oe ie neh mee an ae 
- po fee | i : es Lead = 
a po [a ee ke is 


NOTE 
£3 is n terms of the even term Sum: A5B, i = -4, -2, 0, 2 
OS is n terms of the odd term Sum: A,B, 51 =. =35 21,.1,:-3 


1004 
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3.4 DATA PAD 


The following sections describe the data pad. 


3.4.1 DATA PAD ADDRESSING 


Data-pad is a block of 64 high-speed accumulators used to store 
intermediate results during a computation. In any given AP 
instruction, the programmer has 16 of the data pad accumulators to work 
with; eight in data pad X and eight in data pad Y. They are addressed 
relative to the current value of the data pad address register which 
functions as a base register for data pad. For example, if DPA has a 
value of 2498, locations 20g through 27g would be available for use. 
This is illustrated in Figure 3-l. 


AVAILABLE FOR JSE 


NHEN OPA = 24. 


1005 


Figure 3-1 Data Pad Address 


A displacement value from -4 to +3 may be specified when using DPX and 
DPY (i-e., if DPA=24.): 


DPX (3) means DPX location 24+3=27 
DPY (=4) means DPY location 24-4=20 
DPX(0) means DPX location 24+0=24 


DPY means DPY location 24+0=24 
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Four separate displacements are provided, one each for reading and 
writing DPX and DPY. Thus, four separate locations in data pad may be 
used in a given instruction. With DPA=248, the following instruction 
occurs in one cycle: 


FADD DPX(3),MD; FMUL T™,DPY(-2); DPX(-3)<FA; DPY (1)<FM 
(read DPX) (read DPY) (write DPX) (write DPY) 


This would add DPX location 27 to the last data read from data memory, 
multiply the last data read from table memory by the contents of DPY 
location 22, store the results of a previous add into DPX location a 
and store the results of a previous multiply into DPY location 25. 


All 64 locations of data pad are accessed by changing the DPA pointer: 


INCDPA "Increments DPA by 1 
DECDPA "Decrements DPA by 1 
SETDPA "Loads DPA with the current S-PAD 


"function (SPFN, refer to section 3.7) 


Changes in DPA take effect the next instruction after they occur (i-e., 
if DPA=24): 


FADD DPX(0),DPY(0)3 INCDPA "DPA is still 24 so 


"DPX94 is added to 
"DPY 94 


FADD DPX(0),DPY(0); INCDPA "Now DPA=25, so 
"DPX25 is added to DPY95 


FADD DPX(0),DPY (0) "Now DPA=26, so 
"DPX96 is added to 
"DPY 6 


Thus, by successively incrementing DPA, the data pad can be used as a 
queue; or by properly incrementing and decrementing DPA, the data pad 
can be used as a stack. Data pad address is circular. That is, with 
successive increments of DPA the next location after 37g is 0; with 
successive decrements of DPA the next location after 0 is 37g. 
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3.4.2 WRITING INTO DATA PAD 


Data may be stored into DPX and DPY from FA, FM, or DB (the data pad 
bus ) ° , 


DPX<FA "Store adder result into DPX 
DPX<FM "Store multiplier result into DPX 
DPX<DB "Store data pad bus into DPX 

and 
DPY<FA "Store into DPY 
DPY <FM 
DPY<DB 


The following may be selected onto the data pad bus (DB): 


DB=ZERO "Floating-point zero 
DB=INBS "Input Bus 
DB=VALUE "A 16-bit immediate value 


DB=DPX "DPX 

DB=DPY "DPY 

DB=MD "Last data read from data memory 

DB=SPFN "S-pad function (16-bit integer) 

DB=IM "Last data read from table memory 


Thus, if DPA=24g, the following instruction is possible: 
DPX(3)<FA; DPY(-2)<DB; DB=MD 


‘This stores the current adder result into DPX location 27 and stores 
the last data read from the main data memory into DPY location 22 via 
the data pad bus. 
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3-4-3 DATA PAD BUS 


Data to be stored into DPX and DPY can be moved through three pathways: 
FM, FA, and DB. While FM and FA are fixed in meaning (output from the 
floating multiplier and adder, respectively), the data pad bus (DB) 
pathway can be connected to any one of eight possibilities depending 
upon the programmer’s choice. 

Examples: 


e MD is put into both DPX and DPY: 


DPX<DB; DPY<DB; DB=MD 


MD is put onto the data pad bus, and 
the data pad bus is stored into DPX and DPY. 


e MD is put into DPX and TM into DPY: 


DPX<DB; DB=MD; DPY<DB; DB=IM 


This is an error. Only one choice at a time 
can be made for the data pad bus. This 
double transfer would take two separate 
instructions to accomplish. 


e FA is stored into DPX and MD into DPY: 


DPX<FA; DPY<DB; DB=MD 


MD is put onto the data pad bus in order to get 
it into DPY. FA goes directly into DPX. 
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To simplify notation, data transfers invloving data pad bus can be 
written in a shorthand manner. 


shorthand longhand 
DPX<MD; DPY<MD DPX<DB; DPY<DB; DB=MD 
DPX<MD; DPY<IM DPX,DB; DB=MD; DPY,DB; DB=IM 


(still an error no matter how it is written) 


DPX<FA; DPY<MD DPX<FA; DPY<DB; DB=MD 


In the shorthand notation, choices for the data pad bus are not 
explicitly indicated. Transfers are written as if there were a direct 
connection between the source and destination while in fact it is the 
data pad bus which does the connecting. Remember, however, that the 
programmer is still making a data pad bus choice and only one choice is 
allowed per instruction. Errors like the one shown above (where two 
data pad bus choices are attempted) are detected and flagged by the 
assembler. . 
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3-5 DATA MEMORY 


The following sections describe data memory. 


3.5.1 MEMORY ADDRESSING 


Main data memory cycles are initiated by changing the memory address 
register which points the memory location to be read from or written 
into: 


INCMA "Increment MA by l 
DECMA "Decrement MA by l 
SETMA "function (SPFN, refer to section 3.7) 


All of the above initiate a memory cycle at the address pointed at by 


the new contents of MAe If a memory input (MI) field is also included 
in the instruction, then the memory cycle is a write cycle. Otherwise, 


a read cycle is initiated. When sequential memory locations are 
accessed, a new memory cycle may be initiated by every other AP 
instruction. 
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325-2 DATA MEMORY READS 


Data read from memory is available for use three instruction cycles 
after the read is initiated. The instruction sequence shown in Example 
11 illustrates how memory data is accessed: A, B, and C are 
floating-point numbers in memory locations 101, 102, and 103, 
respectively. It is assumed that MA is set to 100 before starting. 


Example 11 


MEMORY MEMORY DATA 
TIME CYCLE INSTRUCTION ADDRESS (MA) RESULT (MD) 


av 
on 
~“U 
a3 
wo 
eC 
=z 
3 
on - 
Q oO 
& N 
P= > 


mr 
rom 
s 
oc 
un 
y 
* 
fo 
oS 
w& 
QO 


1006 
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Three AP cycles after a given memory location is read, data from that 
location is ready in the memory data register and available for use. 
MD may be used by the adder or the multiplier as follows: 


FADD DPX(3),MD; FMUL DPY (-2) ,MD 


"Do MD+DPX and MD * DPY 


It can also be placed on the data pad bus and stored in data pad or 


back into memory as follows: 


DPX (2) <MD 


3.5.3 AN EXAMPLE 


"store MD into DPX. 


Example 12 loads a vector Aj, i=0,1,2 stored in memory locations 101, 


102, and 103 into DPX locations 10, 


ll, and 12. It is assumed that MA 


is set to 100 and DPA is set to 10 before starting. 


Example 12 


1. INCMA 


3. INCMA 


4. OPX<MD; INCDPA 


INCMA; 


pery 


6. OPX<MD; INCDPA 


8. DPX<MD 
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“Fetch Ag from memory 


“Fatch 


"Store 
"and 


“Fetch 


"Store 


a 


and 


"Store 


22 


A, from memory 


Ay into DPX location 10 
bump DPA pointer to ll 


An from memory 


A, into DPX tocation il 
bump OPA pointer to 12 


A, into DPX location 12 
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Example 13 illustrates the transfer of Example 12 showing the state of 
each component after each instruction. 


Example 13 


Sr DS NS I 


MEMORY DATA PAD 


CYCLE 


EN ES Re AT, ESATA ARID 


rons 
_ 
oO 
pear 
any 
(a) 
4 
4 
‘ 


~ an vi f wo nn 
f— — — — — — 
oO oO oO oO QO oO 
Oo @ @ iS) rm - 
> > > > 
re - 2 a 
a - ~ = ~ 
; So 7 ie ; 
> > > > 
o QoQ wo Qa 
> > 
i - 
1’ 1 ‘ ' | ‘ | ' 
’ ‘ ' t i ‘ 
‘ ‘ ‘ ' ’ 1 
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3-5-4 DATA MEMORY WRITES 


Data memory write cycles are indicated by the following: 


MIL<FA "write the adder result into memory 
MI<IM : "write the multiplier result into memory 
MI<DB _ "write data pad bus into memory 


These instructions load data into the memory input buffer register from 
where it is written into memory- Data may be written into sequential 
memory locations by every other AP instruction. 
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3-5-5 AN EXAMPLE 


Example 14 squares the elements of a vector Ay, i=0,1,2, in DPX 
locations 10, 11, and 12 and stores the results into data memory 
locations 101, 102, and 103. It is assumed that MA is set to 100 and 
DPA is set to 10 before starting. 


i. FMUL OPX,OPX: INCDPA “Square An, bump DPA pointer 
Example 14 a 4 : 
"to il 
2. FMUL “Push down the multiplier 
"pipeline 
3. FMUL DPX,DPX: INCOPA "Square A;, bump OPA pointer 
RStO. 42 
4. FMUL: MI<FM: INCMA , "Write Aj into memory location 
"101 
5. FMUL DPX ,DPX "Square Az 
2 
6. FMUL: MI<FM: INCMA "Write A, into memory location 102 
7. FMUL “Dummy FMUL to empty pipeline 
8. MI<FM; INCMA "Write A> into memory location 103 
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Example 15 illustrates the sequential data memory write computation. 


Example 15 MULTIPLIER MEMORY 
CYCLE MI 
ia tet et. 
: 2 2 
4. A 
2 
5. Ag 

2 
6. a 

2 
2 A, 

2 
8. A2 
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325.6 MEMORY INTERLEAVE 

Data memory is divided into 16 banks of 4K words each using MAOO-MA02 
and MAL5 as a memory bank select. (These are the three highest-order 
bits and the least-significant bit of MA.) Memory references to 
different banks may be made every two AP cycles, while references to 


the same bank may be made every three AP cycles. For some possible 
memory addressing sequences refer to Table 3-2. 


Table 3-2 Memory Interleave Sequence 


MEMORY ADDRESS SEQUENCE (OCTAL) MEMORY BANK SEQUENCE MEMORY REFERENCE TIMING 
AAT NE ET ST TT 


121, 192, 193, 194, ... Le Os le Os os. every 2 AP cycles 


GB (165;. 164 2oso se Os aed Pee: 58 avery 2 AP cycles 


102, 192, 104, 106, ... 9,2, 9,9, ... every 3 AP cycles 


233, 19374, 234, 10376, ... Lig 2a By So. a0 every 2 AP cycles 


1911 


Thus, references to successive sequential memory locations can be made 
every other AP cycle, but references to successive-odd or 
successive-even locations must be three cycles apart. 
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325-7 MEMORY LOCKOUT 
If memory references are made too rapidly for memory to handle, the CPU 
suspends program execution and spins until the memory is no longer 
busy. Thus, suppose the following were coded: 

1. INCMA "referencing memory every cycle 


2. INCMA 


3. INCMA 


The following execution is the result: 


Ons 1. INCMA 
167ns 2+ INCMA 
333ns "SPIN" 
500ns 3. INCMA 
667ns "SPIN" 


The processor waits an extra cycle after instructions 2 and 3 because 
memory is still busy from the previous memory references. This 
arrangement is fine if there is no useful computing to do during the 
spin cycles. Otherwise, it is better to space out the INCMAs and to do 
something useful during the cycle between memory references. 
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3.6 TABLE MEMORY 


The following sections describe table memory. 


3.6.1 TABLE MEMORY ADDRESSING 


Constants stored in table memory are read by setting the table memory 
address (TMA) register to the address of the desired table memory 
location. This is done with the following instructions: 


INCTMA “increments TMA by l 
DECTMA "decrements TMA by 1 


SETTMA "set TMA to the current s—pad 
“function (SPFN) 


Each of the above initiates a fetch from the table memory location 
pointed at by the new contents of TMA. Two AP cycles later, the 
contents of the desired locations are available for use. A new 
location can be fetched every AP cycle. The sequence in Example 16 
illustrates how table memory is accessed. KO, Kl, and K2 are constants 
stored in table memory location 235, 236, and 237. It is assumed that 
TMA is set to 234 before starting. 


Example 16 


TABLE MEMORY 
RESULT (TM) 


TABLE MEMORY 
eYCLE INSTRUCTION _ ADDRESS (TMA) 
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Two cycles after a given table memory location is fetched, the data is 
ready in the table memory data register and is available for use. T™ 
can be used by the adder or the multiplier: 


FADD TM,DPX(2);FMUL TM,DPY(-3) "do TM4+DPX and TM*DPY 
or put on the data pad bus and stored into data pad: 


DPX(-1)<IM "store TM into DPX 


3-6-2 AN EXAMPLE 


Example 17 forms the vector sum Ay = Bi+Ki, i=0,1,2, where A, is in DPX 
locations 10-12, By is in DPY 10-12, and Kj is a series of constants 
stored in table memory location 235-237. Aj; is stored back into DPX. 
It is assumed that DPA is set to 10 and TMA is set to 234 before 
starting. 
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Example 17 


1. INCTMA — "Fatch Kg 

2. INCTMA | "Fetch K; 

3. INCTMA; FADD TM,OPY; INCDPA "Do Kg + Bo, bump DPA to 11 

4. FADD TM,OPY; INCOPA "Do &, + By, bump OPA to 12 

5. FADD TM,OPX (0); OPX(-2)<FA "Do Ky + By, store A in DPX1¢ 

6. FADD: OPS(-1)<FA "Store Ay in OPX14 

7. OPX(0)<FA "Store Ay in OPXz> 

1013 
Example 18 illustrates the computations of Example 17. 
Example 18 
a MEMORY ADDER DATA io x 
cYCL | Faae A1,A2 —<o 12 | 


ee 
- 
TH : 
a 
‘ 


w Mm 
ww 

°o 

vy 

ow 

°o 

i] ' 

i a 


Ms fea) vw 
“~ 
Dd 
nN 
nw aw ras 
nN - o> 
+ + + 
ao o ao 
N - Oo 
roo el - 
i) i) ie) 
P iy t 
3) 1 | 
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326.3 A COMPLEX MULTIPLY 


An example using both memories, a complex multiply from the FFT (fast 
fourier transform) algorithm, is shown in Example 19. The multiply is 
between a complex signal point held in data memory and a complex 
exponential value (a root of unity, e10) fetched from table memory. 
The computation is: 


Xp=Cp * We - Cy * WI 
Xy=CR * Wr + CrWR 
Where C is the data point and W is the complex exponential, R and I 


denote real and imaginary parts, respectively. C is in main data 
memory, and W is in table memory. 


Example 19 

Fetch the 1. INCMA "Fetch cp from data memory 

faur arguments 2. INCTMA "Fetch Wp from table memory 
3. INCMA: INCTMA "Fetch C) fetch Wy 
4. FMUL T,MD "Do Cp = Wo 

Do the 5. FMUL TM,MD: DECTMA "Do Ce * Wy fetch Wy 

multiplies 6. FMUL T,MD | "Do Cy * Wy 
7. FMUL TM,MD: OPX(0)<FM "Do Cy * Nps save Cplns in DPX 
8. FMUL: OPX(1)<FM "Save CoWy in DPX 

00 the two - 9. FMUL: FSUBR FM,DPX(0) "Do Ke + CoWp-CrWy 

adds 10. FADD FM,OPX(1) "Do Xr = CoWy + CyWp 
11. OPX(0)<FA; FADD kp is ready, save in OPX 
12. OPX(1)<FA Kr is ready, save in DPX 
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The total elapsed time is 12 cycles or 2us. In practice, however, all 
but cycles four through seven with the preceding and following 
computations can overlap. The complex multiply then takes only 667ns 
when mixed in with other computations. 


Example 20 summarizes the complex multiply. 


Example 20 


MEMORIES MULTIPLIER ADDER DATA PAD 


| 
CYCLE MD ’ FM A1,A2 FA i 0 Vi 


tea 


uo 
. 


| R 
10. a" € w Ror Welt sir C R wale W1Cp 
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The s-pad is a 16-bit wide integer unit used primarily to compute 
memory address pointers and to test loop counters. It is similar in 
capability to a minicomputer and is programmed like the 
register-to-register instructions of the Nova and PDP-11 computers. 
There are 16 registers in the s=-pad unit. 


3-7-1 SINGLE OPERAND INSTRUCTIONS 


Table 3-3 lists the single operand instructions. One item can be 
chosen from each column. 


Table 3-3 Single Operand Instructions 


NO 
SHIFT LOAD 


DESTINATION 


OPERATION REGISTER 


1917 


The operation is performed upon the contents of the destination 
register (DST), and that result is shifted. The shifted result is 
stored in the destination register unless a no load (#) is specified. 
The shifted result is the s-pad function (SPFN), which may be stored 
into an address register (MA, TMA, or DPA) or placed onto the data pad 
bus (DB=SPFN). Some examples where SP, refers to the contents of s-—pad 
register "n" are illustrated in Example 21. 
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Example 21 


Hie 6 * "(SP.+1)+SP 5 


DECR 3 "(SP 5-1)/2+5P3 
COM 3; DPX<SPFN "SP 3-SP :+DPX 
CLR# 2; SETDPA "Q.DPA; because of 2 (no load) 


SP> remains unchanged 


3.7.2 DOUBLE OPERAND INSTRUCTIONS 


Table 3-4 lists the double operand instructions. One item can be 
chosen from each column. 


Table 3-4 Double Operand Instructions 


NO SOURCE 
SHIFT LOAD DEC IMATE REGISTER 


SE AS AA AE, 


DESTINATION 
REGISTER 


OPERATION 


src, dst, 


1018 
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The operation is performed between the source (SRC) and destination 
(DST) registers. If bit reverse (X) is specified, the contents of 
source are bit-reversed before being used. The shift is performed on 
the result which is then stored into the destination register unless no 
load (#) is specified. The shifted result is the s=pad function 
(SPFN), which may be stored into TMA, MA, or DPA or placed onto the 
data pad bus. 


Example 22 


MOV 3,15 "SP s—SP) « 
ADDL 6,10; SETMA —"((SPig) + (SP5)) * 23 SPyo>MA 
SUB 7,13 "(SP 3-SP7) SPy3 

AND#5,11; SETDPA "(SP,, AND SPs)+DPA 

OR# 46,7; SETTMA "(SP7 OR SP, (Bit-reversed) )+TMA 
MOVRR 2,2 "(SP )/4+SP 2 
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For purposes of program clarity, the assembler allows names to be given 
to the s-pad registers. If register PTR is a pointer to an array in 
data memory, and register STEP contains the increment value used to 
step through the array, then the following instruction word advances 
the array pointer by the proper increment and fetches the next array 
element from memory: 


ADD STEP,PTR; SETMA 
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The following conditional branches test the s-pad function: 


BR LOOP 


BEQ LOOP 
BNE LOOP 
BGE LOOP 
BGT LOOP 


"branch unconditionally to program 
"location "LOOP" 
"branch if SPFN=0 
"branch if SPFN#0 
"branch if SPFN>0 
"branch if SPFN>0 


The above branches test the s-pad result from the immediately preceding 


AP instruction. 


Thus, an s—pad operation must be done one instruction 


cycle before it is desired to test the result. 


An example of loop counting is shown in Example 23. 


Example 23 


DEC 2 
BNE LOOP 


"decrement SP9 
"branch to “LOOP if SP has not 
"yet reached zero 


Example 24 tests the contents of SP3 to see if it is between a lower 
limit contained in SP2 and an upper limit in SP, (i-e., if SP9<SP3<SPy,. 
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Example 24 


SUB# 3,2 
SUB 4,33; BGT SMALL "Too small, SP3<SP9 
BGT BIG "Too big, SP3>SP, 


The branches are made relative to the current program source address 
with a 5-bit displacement value. This means that the branch target 


address must be within -20g to +17 locations of the current 
instruction. 


327-4 AN EXAMPLE 


Example 25 loads data pad X with an array A, with N elements starting 


at main data memory location 37219, CTR is in s-pad register which is 
used as a counter. 


Example 25 
l. CLR# CTR: SETDPA "Set DPA to 0 
ie LOMA: 0B=3721 “Fetch the first element 
Se LDSPI CTR: DB=N "Initialize "CTR" to N 
4. LOOP: THCMA; DEC CTR “Fetch next element, A;+1 
55 DPX<MD; "Store A. into OPX;, aavance 


INCDPA: BNE LOOP “OPA and test counter 
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Example 26 shows the loop in Example 25 for the N=3 elements. 


Example 26 


MEMORY DATA PAD 


INSTRUCT ION 
NUMBER 


on 
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A generalization on the previous example to fetch array A from every 
Kth memory location is shown in Example 27. The increment is stored in 
S-pad register STEP, and the array pointer is stored in PTR. 


Example 27 


1. LOSPI STEP: OB=K “Initialize "STEP" to K 
2. CLR# CTR; SET DPA "Set DPA to 0 
3: LDMA; OB=BASE “Fatch the first element, A, 
4, LOSPI CTR: DB=N “Initialize “CTR” to N 
5. LOOP: ADD STEP,PTR: SETMA "Advance memory pointer. Fetch 
BEQ OONE "next element, Ayt+l. Test 
"counter and jump out if 
“done. 
6. OPX<MD; INCDPA "Store A; into OPX,, advance DPA 
DEC CTR: BR LOOP " Decrement "CTR" and jump 


"back to LOOP. 


1923 
7. DONE: -- 
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CHAPTER 4 


INTERFACE — 


4.1 INTRODUCTION 


This chapter describes the interface between the host computer and the 
AP. The interface is composed of two basic parts: a simulated front 
panel and direct memory access control. The front panel allows the 
host computer to examine or modify the internal AP registers, as well 
as provides for block transfer of data from the host computer to the 
AP, and vice versa. 


4.2 FRONT PANEL 


The AP panel is used for bootstrap operations (loading and starting 
programs) and for debugging user software (inserting hardware 
breakpoints and examining and modifying AP registers and memory). The 
panel consists of three 16-bit registers which are under the control of 
the host via the host interface. The functioning of these registers 
closely parallels that of the switches and lights on the console of a 
stand-alone computer. The host can examine and/or set these registers 
at any time, regardless of the state of the AP. The front panel and 
host interface is shown in Figure 4-l. 
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HOST DATA BUS HOST OMA ADDRESS 8US 


— — - LL: 
BREAK-~ 


FORMAT 


AP [/0 BUS 


POINT APMA 


DIRECT TO 
MEMORY 
ADDRESS 


AP CPU AP REGISTERS MAIN DATA 


AP DEVICE ADDRESSES i325 
REGISTER DA 
WC D 
HMA l 
CTL 2 
APMA 3 
FORMAT 4 


Figure 4-1 AP Panel and Host Interface 
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4.2.1 SWITCH REGISTER 


The switch register (SWR) is used to enter data and addresses into the 
AP. The SWR can be read and written by the host computer. An 
executing AP program can also read the switches. 


4.2.2 LIGHTS REGISTER 


The lights register (LITES) simulates front-panel lights and is used to 
display the contents of internal AP registers. This register can only 
be read by the host. The executing AP program can set the lights 
register. 


4.2.3 FUNCTION REGISTER 


The function register (FN) provides front-panel control operations 
(start, stop, continue, etc.). It can be read or written by the host. 
The format of the function register is shown in Figure 4-2. 


8 9 4 12 11 


2, 13 14 15 ' 
REGISTER SELECT 
{ 1 
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an} 2 34 4 5 6 47 
STOP {START | CONT } STEP | RESET] EXAM BREAK 
' i ! 


Figure 4-2 Panel Function Register Format 


When the AP is running, only the STOP and RESET panel functions are 
valid. The other panel functions can only be exercised after the AP 
has halted. The panel functions are described in Table 4-1. 
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Table 4-1 Function Register Bits 


BIT MNEMONIC 


! 


p STOP/ HALTED 


EXAM 


on 


6 DEP 

7 BREAK 
8&9 INC 
19 & 11 


12 - 15 REG. SELECT 


EFFECT 


Stop AP orogram execution upon completion of the current instruction. 
Nhen the host reads the FN register, this bit reflects the current 
State of the processor. This bit is set if the AP is halted. 

(See note. ) 


Start program execution at the address specified in SWR. 


Continue program execution at the instruction pointed at by PSA (program 
source address). 


Execute the instruction pointed at by PSA and then halt. Advance PSA 
to point to the next instruction. 


Stop the AP immediately. Clear s-pad register @. Set SPFN to SP 
Clear the AP status register. Stop the host DMA (CTL bit 15 
set to 0) and clear main data memory timing. 


SPD* 


Examine the register or memory selected by the register select field. 
Display the portion selected by the WORD field in the panel display 
register. 


Deposit the contents of the switch register into the register or 
memory selected by the register select field. Deposit into the 
portion selected by the WORD field. 


Enables hardware breakpointing if PSA, MA, or TMA is specified in the 
register select field. The breakpoint causes the AP to halt one instruc- 
tion after any instruction where the contents of the selected reqister 
was equal to the contents of the switch register. Thus, if a breakpoint is 
specified with PSA selected the AP halts after executing the instruction 
at the program location set in the switch register. PSA points to the 
next micro-instruction in sequence. If a breakpoint is called for on 

MA or TMA, the AP halts after executing the instruction following the 
one that referenced the trapped memory location. PSA points to the 
second sequential instruction after the one that caused 

the breakpoint. Memory breakpoints aid in debugging those elusive 

errors that modify memory unexpectedly. 


Increment MA, TMA, or DPA following completion of the other specified 
panel functions. This allows sequential memory locations to be examined 
or deposited into. (Refer to Table 42.) 


Specifies which portion of a register is being examined or deposited 
into. (Refer to Table 43.) 


Specifies which AP internal register or memory location to examine or 
deposit into. (Refer to Table 44.) 


NOTE 


If the current instruction performs a SPIN while waiting for I/0 or 
memory, the STOP does not take effect until the spin condition is 
satisfied and the instruction completed. 


FPS 860-7259-003 


1026 


Table 4-2 Bits 8-9 


ADORESS REGISTER 
TO BE INCREMENTED 


VALUE IN BITS 8 & 9 


Y) None 

1 MA (Memory Address) 

2 OPA (Data Pad Address) 

3 TA (Table Memory Address) 
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Table 4-3 Bits 10-11 


VALUE SET IN BITS id & 11 Pere REGISTER | 38-81T REGISTER 64-BIT REGISTER 


1 Exponent Bits 00-29; Bits 16-31 
- yignt-justified in 
16-Bit field. 


3S 


2 N/A High mantissa Bits 32-47 
Bits 00-11; 
right-justified 


1027 
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CCTAL VALUE SET 
IN BITS 12-15 
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Table 4-4 Octal Values 


REGISTER OR 


MEMORY SELECTED 


DESCRIPTION 


Program Source Address 

S-Pad Destination Address 

Main Data Address 

Table Memory Address 

Data Pad Address 

S-Pad Function (EXAM) 

S-Pad address by SPD (DEPOSIT) 

AP Internal Status Reg. 

Device Address Register 

Program Source Memory addressed by TMA 


Examine I/0 device output register 
addressed by DA 


Control Buffer, Bits 48-63 (EXAM only) 
Data Pad X addressed by (DPA-4) 

Data Pad Y addressed by (DPA-4) 

Main Data Memory addressed by MA 

S-Pad Function (EXAM) only 

Table Memory Addressed by TMA (EXAM only) 
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4.3 NOTES ON THE USE OF THE FRONT PANEL AND BREAKPOINT 


4.3.1 WHERE DOES THE AP STOP ON A BREAKPOINT? 


e With the breakpoint set on PSA, the AP stops 
with PSA pointing to the next instruction to be 
executed. 


Thus, breaking on a branch instruction and then 
examining PSA shows whether the branch 
condition is true or false. 


e With the breakpoint set on TMA, the AP stops 
with PSA pointing to the second instruction following 
the one that set TMA to the break address. 


e With the breakpoint set on MA, the AP stops on 
either the next instruction or the second instruction 
after the one that set MA to the break address, depending 
on the state of the memory lockout hardware (next 
instruction if memory lockout, second instruction if no 
memory lockout). 


Thus, the stopping point following an MA breakpoint 
has a one-instruction uncertainty. 


4.3.2 DOES THE INSTRUCTION ON WHICH THE AP STOPS EXECUTE? 


Since SPFN is current, it is set to the operation specified in the 
instruction that PSA is pointing toe Otherwise, the instruction that 
PSA is pointing to remains unexecuted. It executes correctly when the 
user steps or proceeds from the breakpoint. 
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4.3.3 WHAT ABOUT MD TIMING AND LOCKOUT ON A BREAKPOINT IN THE 
MIDDLE OF AN MD MEMORY CYCLE? 


e The hardware is designed so that the AP can be 
stopped in the middle of a memory cycle. The hardware 
remembers where the memory timing is when the AP 
stops so that the processor can continue correctly 
from a breakpoint that occurs during a memory cycle. 


e However, the user must not examine MD nor should 
there be any DMA transfers going to or from MD 
while the AP is stopped if the user wishes to 
proceed from the breakpoint. 


Thus, for example, it is possible to break in the 
tight-to-memory portions of the FFT and examine 

data pad or the address registers (PSA, SPA, etc.) 
and then proceed. It is not possible to proceed 

if the user or the host interface disturbs the memory 
timing by reading or writing MD or TM. 


4.3.4 SUMMARY OF THE RULE FOR PROCEEDING FROM BREAKPOINT 


If the breakpoint causes the AP to stop in the middle of the memory 

cycle (PSA pointing to first or second instruction following SETMA, 

INCMA, DECMA, or LDMA), the user should not try to examine or modify 
MD. 


423.5 WHAT ABOUT STEPPING THE AP? 
The same rules for proceeding from a breakpoint apply to stepping the 


AP throygh a program. The user can examine and modify any register of 
memory within the constraints mentioned in section 4.3.4. 
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4.3.6 WHAT OTHER PITFALLS ARE THERE IN THE USE OF THE VIRTUAL 
FRONT PANEL? 


e Note that the panel always examines SPFN, not SP 
Thus, the user must force SPFN = SPspp to see 
SPspp- This can most easily be done via the 
panel reset function which has the side effect of also 
clearing SP(0). 


SPD * 


e To examine TM, the user should first set TMA and 
then do a dummy panel operation (deposit TMA again, 
for example) in order to enter the output of table 
memory into the table memory buffer register. The 
user can then proceed to examine the addressed 
location using the appropriate panel functions. 


e MD: setting MA from the panel initiates an MD memory 
read cycle. -Depositing into MD from the panel 
initiates an MD memory write cycle. 


Thus, to write MD and then examine what was just 
written, the user must perform a deposit into MA 
operation (with the same address) to initiate a 
read cycle before examining MD. 


e Using the increment field in the FN register: 
DPA and TMA always increment after the EXAM or DEP 
operation is complete (remember that TMA is used 
to address program source memory for panel operations). 


MA post-increments and initiates a new memory read 
cycle on an EXAM operation. 


MA pre-increments on a DEP operation. 
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e The recommended procedure for starting the AP is as 
follows: 


1. Set the SWR to the starting address and do a 
deposit into PSA. 


2. Set the SWR to the desired breakpoint and doa 
continue to start the AP. 


This procedure has the significant advantage of placing 
the necessary breakpoint code into the user’s program 
should the AP program need debugging. 


The panel START function can be used, but the user should 
observe the following restrictions on the first 
instructions executed by the AP. The first instruction 
should not branch, jump, or modify PSA in any way other 
than to advance to the next instruction. The first 
instruction should not use the SPEC and I/O fields. In 
fact, the preferred first instruction is a NOP (all 
zeros). 
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4.4 DIRECT MEMORY ACCESS 


In addition to the panel function, the AP contains four 16-bit 
registers that are used for direct memory access (DMA) to both host and 
AP data memory, plus a 38-bit format conversion register that acts as a 
buffer between the two memories. These registers may be read and/or 
loaded from either the host computer or the AP. 


4.4.1 HOST MEMORY ADDRESS REGISTER 


The host memory register (HMA) points to consecutive locations in the 
memory of the host computer. It operates in either auto-increment or 
auto-decrement mode during DMA transfers to and from host memory. HMA 
is device address 1 for AP internal I/O transfers. 


4.4.2 WORD COUNT REGISTER 


The word count register (WC) counts the number of host memory words 
transferred in a DMA operation. It is preset to the desired number of 
words to be transferred and counts down as the transfer proceeds, 
stopping the DMA transfer when it reaches zero. Hardware logic 
prevents this register from being counted past zero. WC has AP device 
address 0. 
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4.4.3 AP DIRECT MEMORY ADDRESS REGISTER 


The AP direct memory address register (APDMA) points to consecutive 
locations in AP main data memory during DMA transfers to and from MD. 
This register can operate in either auto-increment or auto-decrement 
mode. APDMA has AP device address 3. 


4.4.4 CONTROL REGISTER 


The control register (CTL) acts as a control over the DMA and interrupt 
functions of the host interface. This register controls the direction 
and mode of transfer (DMA or program control) and the type of data 
format and provides certain bits of status information pertaining to 
the transfer. CTL has AP device address 2. The format of the control 
register is shown in Figure 4-3. The bit descriptions are contained in 
Table 4-2. 


6 4 7 3 9 4 ll a ee 14 iS 4 
AP WRT JEC DEC HOMA 
1 | t j 
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Figure 4-3 DMA Control Register Format 
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Table 4-5 DMA Control Register Description 


BIT MNEMONIC EFFECT 
Sere 


Indicates that the word count register is zero. Note that WC is 
decremented only during OMA transfers te/from host memory (read 
only bit}. Should not be used to monitor DMA activity. 


1 INTR AP Sets the INTRQ (interrupt request) flag in the AP. 


2 TAPWC Sets INTRQ (interrupt request) flag in the AP when the OMA transfer 
is done. 


3 THALT Enables a host interrupt when the AP halts. 
4 THWC Enables a host interrupt when the OMA transfer is done. 


5 LHENB Interrupt Host Enable. Interrupt Host if AP attempts to set this Bit. 

This .bit can actually be written only by the Host. (This is not 
. upported on all ho em 

6 FERR Format error. Indicates that exponent underflow or overflow occurred 

in conversion from AP format to host floating-point format. 

7 DLATE Data late. Indicates that the AP did not empty the format buffer 
before the host attempted to reload it. On some hosts this bit also 
indicates an attempt to access non-existent host memory. In either 
case the DMA transfer is terminated. 

8 cc Consecutive cycle. Block DMA transfers to/from host memory occur 
without interruption. On typical hosts, the host CPU is locked out 
but other higher priority OMA devices stil] have access to host 
memory . 

9 APDMA Allows the interface to perform OMA transfers to/from AP memory. 
Depending on the direction of transfer, a main data memory cycle is 
initiated every time the host finishes reading or loading the format 
register, whether via DMA or program control. On the 4P side, the 
format register is loaded from the main data bus instead of the data 
pad bus. 

19 WRTHOST Write to host. This bit controls the direction of transfer. If sat, 
data is read from the AP, passed through the format register, and 
written to the host. If clear, the direction of transfer is reversed. 

il DECAPMA Decrement APMA. If set, APMA is decremented during DMA transfers 
to/from AP Main Data memory. If clear, APMA is incremented. (This 

: apabili is not present on al} ho systems. 
12 DECHMA Decrement HMA. If set, HMA is decremented during DMA transfers to/ 
| from host memory. If clear, HMA is incremented. 
13 & 14 Format Register Control. (See note. ) 
15 HOMA Host DMA start. Initiate OMA transfers to/from host memory. When read, 


start/busy the state of this bit reflects the status of the host OMA activity 
('1' if active, '@' if inactive). Transfers continue until WC = @. 


NOTE 


The format register mode of operation is controlled entirely by 
bits 9, 19, 13 and 14 of the control register. Thus, even the host 
and the AP can load and read the format register via program 

control 1/0 transfers at any time. The programmer must be sure 

that the type of transfer 4e performs is consistent with these 

bits of CTL for the transfer to be meaningful. (Refer to Table 46.) 
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Table 4-6 Bits 13-14 


VALUE IN BITS 13 & 14 FORMAT TYPE 


2 32-Bit Integer. No format conversion. Used to 
transfer integers or program half-words. 


1 16-8it Integer. 16-bit integers from host are 
converted to unnormalized 38-bit AP FPNs. Low 
16-bits of AP FPN are sent to host. 


2 Conversion of "“Signed-magnitude mantissa with 
binary exponent" format to/from AP floating 
point format. Includes logic to handle 
“phantom bit" formats. 


3 Conversion of IBM 32-bit format to/from AP 
format. IBM format can be specified to have 
either sign-magnitude or two's complement 
mantissa. 


NOTE 


For format types 2 and 3, the format register has the necessary logic 
to detect overflow and underflow on conversion from AP format and to- 
force a signed maximum quantity on overflow or floating point zero on 
underflow. 
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4.5 FORMAT CONVERSION REGISTER 


This 38-bit double-buffered register is used for all transfers of 

. floating-point numbers (FPNs) between the host and the AP. It also 
provides the most efficient path for transfer of microcode half-words 
(32 bits). It performs bi-directional format conversions under the 
direction of bits 9, 10, 13, and 14 of the CTL register. The 
programmer must be aware of the fact that the format conversion is a 
slave to these CTL bits. Nonsence results if transfers to and from the 
formatter are not consistent with these CTL bits. The host and AP can 
read the output of the formatter at any time without restriction; 
however, the input to the formatter is controlled by CTL bits 9 and 10. 


Table 4-7 CTL Register Bits 9-10 


CTLQO CTLIQ INPUT PATH TO FORMATTERS 
1 AP Main Data Output 
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The formatter has a ready indicator that can be sampled by the AP. 
This indicator tells the AP when to load new data into the formatter 
(CTL10=1) and when to read data from it (CTL10=0) after the host has 
finished reading or loading the last 16-bit word of a FPN. 


Note that in 16-bit host computers, the interface expects to receive 
words in different order depending on CTL bit 12 (DECHMA). If bit 12 
is clear (i-e., the host DMA interface is going through memory in 
forward order from low to high addresses), then the interface expects 
to receive the high word of an FPN followed by the low word. If bit 12 
is set, the interface expects to receive the low word followed by the 
high word. This is done so that arrays of FPNs are always stored in 
forward order in host memory. 


If the format CTL bits (bits 13 and 14) specify a 16—bit transfer 
(FMT=1) then the integer is loaded and read from the low word of the 
formatter. That word is considered to be the last word transferred. 


There is no corresponding indicator to the host since the AP can 
transfer data to and from the formatter faster than most host 
processors. The DLATE bit in the CTL register (CTL bit 7) does 
indicate when an error of this type occurs (i-e», when the host 
transfers data faster than the AP). 
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4.6 AP INTERNAL INTERFACE TO HOST INTERFACE 


The registers in the host interface are accessible to the AP via its 
input/output (1/0) instructions (FADD=7). 


Table 4-8 AP Device Address for Host Interface Registers 


1/0 DEVICE DEVICE ADDRESS 


HOST INTERFACE 
DMA REGISTERS: 
WORD COUNT REGISTER (WC) 0 
HOST MEMORY ADDRESS REGISTER (HMA } 1 


CONTROL REGISTER (CTL) 


me 


AP MEMORY ADDRESS REGISTER (APMA) » 3 


FORMATTER (FMT) 4 


WRITABLE TABLE MEMORY (TMRAM) | 5 


PAGE SELECT SELECT OPTION 


MEMORY ADDRESS EXTENSION (MAE) 


APMA EXTENSION (APMAE ) 


MASK (including MODE and I[/0) 


ADDITIONAL DEVICE ADDRESSES: 


First f0P16 : 10-14 
Second I0P16 20-24 
Parity Option i 33-37 


First PIOP 100, 101, 110-117 
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An IN, OUT, or SNSA instruction at DA=4 (FORMAT) generates an IODRDY 
response if the format register is ready to accept data from the AP 
(CTL bit 10=1) or if it has formatted data ready for the AP (CTL bit 
10=0). If CTL bit 9 is 1, the AP cannot load the formatter via 1/0 
instructions since the input multiplexer to the format register is set 
to select main data instead of the AP I/O bus. Note that the AP cannot 
change the state of CTL bit 5. An interrupt of the host is generated 
if it attempts to set this bit when the bit has already been set by the 
hosts The AP can read the CTL at any time without interferring with 
the host interface. If both the host and the AP try to write CTL or 
access HMA, WC, or APMA at the same time, the host selection and data 
has priority over that of the AP. 


Access to the format conversion register is controlled by CTL bits 9, 


10, 13, and 14. Refer to section 4.4 for a description of the function 
of these bits. 
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4.7 AN EXAMPLE OF LOADING PROGRAMS INTO THE AP 


Loading and running a program in the AP from a cold start is a 
five-step process which illustrates use of the front panel. 


4e 


Using the AP front panel from the host computer, 
finger switch in a three-instruction bootstrap 
program into program memory. 


Start the bootstrap running. 


Set the address in the AP where the loaded 
program is to go. 


Start a DMA transfer of program words from 
host computer memory to the AP. The bootstrap 
program running in the AP stores these words 
into program memory. . 


When the DMA transfer is done, stop the bootstrap 
program in the AP and then restart the AP 
executing the newly-loaded program. 


These five steps are detailed in the remainder of Chapter 4. DMA 
control and front panel interrogation is done from the host computer by 
setting various interface registers. The actual host computer 1/0 
instructions to accomplish this, of course, depend upon the particular 
host computer. For the purposes of this explanation, the indicated 
numbers are loaded into a designated interface register in order to 
accomplish the desired goals. 
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Step 1: 


For the purpose of this example, the bootstrap program is put into 
program source memory locations 0, 1, and 2. 


1. Set TMA to O (TMA is the pointer used by the panel 
functions for examining or depositing into program 


memory): 
0 —> SWR Put O into the switches. 
1003 —» FN Put 1003 into the function register 


(causing a deposit into TMA). 


2» Put bits 0-63 of bootstrap program program word no. l 
into program memory location 0 using four deposits 
of SWR —>PSrya. 


(bits 0-15) —»SWR Put bits 0-15 into the switches. 
1010 —> FN Put 1010 into the function register 
(causes a deposit into bits 
O-15 of PS). 


(bits 16-31) —»SWR Put bits 16-31 into bits 
1030 ——-»FN 16-31 of PSywa- 


(bits 32-47) —»SWR Put bits 32-47 into 32-47 
1050 —YFN of PS yma. 


(bits 48-63)—>%SWR Put bits 48-63 into bits 
1370 —» FN 48-63 of PSM, and 
increments TMA to point to 
location l. 


3. Repeat the second and third bootstrap program 
words in noe. 2 above. 


It is necessary to perform these steps only once. 
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Step 2: 


Set the address in the AP program memory where the program is to be 
loaded by the bootstrap into TMA. For this example, this address is 
200: 


200 —> SWR Put 200 in the switches. 
1003 —» FN Put 1003 into the function register 
(causes a deposit into TMA). 


Step 3: 
Start the bootstrap program running in the AP. 
Set the switches to 0 and do a start. 


0 —» SWR | 
- 40000 —FN Start the AP at location 0. 


The bootstrap program (as demonstrated in step 4) spins while waiting 
for words to come across the DMA from the host computer. . 


Step 4: 


Start the DMA transfer from host memory into the AP. For this example, 
it is assumed that the program is in host memory at location 20000. . 
The program to be loaded is 200 AP program words (or 800 16-bit host 
words) long. The actual host memory location and length could be any 
particular value. 


20000 —» HMA Set host DMA address to 20000. 

800 ——~ WC Set word count to 800 host words 
(assuming a 16=<bit host word width). 

201 ——~ CTL Start the DMA. 


Note in particular the CTL bits. Bit 15 initiates the DMA and bit 8 
requests consecutive memory cycles from the host- By not setting bits 
10 or 11, the transfer is set to go to the AP, but not into main data 
memory. Instead, the data goes only as far as the formatter which the 
bootstrap reads. If bit 4 is set, the host computer is interrupted 
when the DMA is done. 
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Step 5: 


Finally, the three-word bootstrap program is ready to run in the AP. 
1. LDDA; DB=4 "set DEVICE ADDRESS to 4 


This instruction sets the device address register so that future 1/0 
instructions refer to device no. 4, which is the DMA formatter (where 
the data from the host computer ends up). 


2. LOOP:SPININ; “wait for some data 
DB=INBS; "get the data 


LPSLT "put it into the left half of P.S. 


The SPININ causes the processor to hang until the current I/0 device 
address (in this case, the DMA formatter) has some new data. Then, to 
read that data, the DB=INBS puts the input data onto the data pad bus. 
The LPSLT puts what is on the data pad bus into the left half (bits 0 
through 31) of the program memory location pointed at by the TMA 
register. 


Two points should be considered: 


e The formatter is 32 bits wide on the AP end; every time 
the interface receives 32 bits of data from the host 
computer, the SPIN stops waiting, and another 32 bits of 
data are processed. Since the program words loaded are 
64 bits wide, they are halved (left, right, left, right, 
etc-) and stored accordingly into program memory. 


e TMA is used as a pointer indicating where the bootstrap 
should place the program it is loading; thus, the LPSLT 
puts the program words into the proper place. 


3. SPININ; "wait for data 
DB=INBS; "set the data 
LPSRT; “put it into the right half 
INCTMA; "increment pointer 
BR LOOP. "go back for more 


This does basically the same as no. 2 above except that this processes 
the right half (bits 32-63) of a 64-bit program word. The INCTMA 
increments the storing pointer so instruction no- 2 stores its data 
into the next word. The branch uses loop waiting for more program 
half-words. 
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Step ‘6: 


Back in host, waiting for the DMA transfer is accomplished by: 


e reading the CTL register 
e testing for bit 15 (the LSB) equal to l 


e if so, going back to step l 


Enabling a host interrupt on DMA completion is also possible. 


When DONE, the bootstrap program is stopped (which otherwise would run 
forever) with a panel RESET function, and the newly-loaded program is 
started (example starts at location 200): 


4000 —> FN "reset the AP 

200 —~* SWR wnew program address 

1000 —~» FN "set 200 into PSA 

20000 —* FN “continue (from 200) (i-e., start 


at AP location 200) 


To set a program breakpoint, the user can set the breakpoint address 
into the SWR and use 20400 (continue + break on PSA) for the final 
panel function. . 


NOTE 


The simplest way for the running AP 
program to indicate to the host computer 
that it is done with its task is to HALT. 
When this happens, bit 0 in the panel 
function register is set (which the host 
can test for) or a host interrupt can be 
enabled (CTL bit 3). 
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APPENDIX A 


AP REGISTERS/DATA PATH NAMES 


Table A-l Registers and Data Paths 


mnemonic width name 

SP 16 bits scratch pad registers (16) 

SPD 4 s-pad destination address register 
SPFN 16 scratch pad ALU/shifter function output 
PNBLS 16 panel bus 

SWR 16 panel switch register 

LITES 16 . panel display register 

APSTATUS 16 AP status register 

PS 64 program source memory 

CB 64 command buffer 

PSA 16 program source address register 

SRS 16 subroutine return stack 

SRA 16 subroutine return stack pointer 
DPX 38 data pad X registers (32) 

DPY 38 data pad Y registers (32) 

DB 38 data pad bus 

DPA 16 data pad address register 

™ iC 38 table memory output register 

TMA 16 table memory address register 

MD 38 data memory output register 

MI 38 data memory input register 

MA 16 memory address register 

Al 38 floating adder input register no. 1 
A2 38 floating adder input register no. 2 
FA 38 floating adder output register 

M1 38 floating multiplier input register no. 1 
M2 38 floating multiplier input register no. 2 
FM 38 floating multiplier output register 
LIODEVICE — 1/0 device 

DA 16 I/O device address 

INBS 38 I/O input bus 

LODRDY 1 I/O data ready flag 

A 1 I/O device condition A flag 

B 1 I/O device condition B flag 
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Subscripts indicate addressing within memory element (i-e., PSpca means 
the location in program source memory pointed to by the program source 
address register). 


Superscripts indicate portions of word (i.e., A2E means the exponent 
portion of the A2 register). 


Parentheses around a symbol indicates the contents of a register (i-e., 
(Al) means the contents of the Al register). 


Table A-2 AP Internal Status Register 


an ae 2 3 4 4 5 6 4 7 8 9 4 ll 12 4 13 14 is 4 


1934 


bits mnemonic meaning 
0 OVF Set when the current adder or multiplier 


(FA or FM) has overflowed. Overflow 

occurs when an exponent value is increased 
above 511. The offending result is set to 
the signed maximum of value of (1-2727) * 
2511, which is roughly 6.7 * 10153. This bit 
remains set until cleared by the microprogram 
or host computer. 


1 UNF Set when the current adder or multiplier 
result (FA or FM) has underflowed. Underflow 
occurs when an exponent value is 
decreased below -512. The minimum legal 
magnitude which numbers can take without 
underflowing is roughly 3.7 * 107155, 

The offending value is set to zero. This bit 
remains set until cleared by the microprogram 
or host computer. 
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Table A-2 Internal Status Register (cont.) 


bits mnemonic 

2 DIVZ 

3 FZ 

4 FN 

5 Z 

6 N 

7 c 

8 PERR 

9 PENB 

10 SRAO 


FPS 860-7259-003 


meaning 


A divide by zero has occurred. The result 
was set to the value of the dividend. This 
bit remains set until cleared by the 
microprogram or host computer. 


Set when the current adder result (FA) 
is zeroe | 


Set when the current adder result (FA) 
is negative. 


Set when the current s-pad function (SPFN) 
is zero. 


Set when the current s-pad function (SPFN) 
is negative. 


S-pad carry bit. If no s=pad shift is 
specified, carry is the carry bit from the 
s-pad ALU. If a shift is specified, carry 
is the last bit shifted off the end of the 
s-pad result by the shift. 


(Optional). Set when a main data memory parity 
error has occurred. Three parity bits are 

used, one each to check the exponent, high 
mantissa, and low mantissa portions of the memory 
word. If PENB is set, the processor 

halts on this error. (See Page Select/ 

Parity Option Manual (FPS 860-7365-000) 

for more information. ) 


(Optional). Enables halt on memory parity error. 
If set, the processor halts when a memory 
parity error is detected. 


Subroutine return stack overflow. Set if 
more than 16 levels of nested subroutine 
calls occur. 


Table A-2 Internal Status Register (cont.) 


bits mnemonic meaning 
ll IFFT Inverse FFT flag- When set in conjunction 


with the FFT flag, bit 12, roots of 
unity table references are interpreted 
as positive angles. 


12 FFT FFT flag. When set, table memory 
addresses are interpreted as negative 
angles referencing the roots of unity 
table contained in table memory. 


13-15 bit | 15-LogoN where N is the length of a 


reverse complex data array to which the s-pad Sddvees bit 
reverse operator is being applied. 
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nae aa, 
3US INPUTS: HOST {1/0 DEVICE | FYNCTIGNAL UNIT OUTPUTS: 


COMPUTER 


Data Pad X Output 

Data Pad Y Jutput 

Jata “Memory Outout 
Table Memory Output 
F.D. Adder Output 

C.P. Multiolier Outcut 
S-Pad ALY Outout 
Direct Memory Address 
Program Source Jutout 
Inout Bus 


. PBS - data Pad Bus (38) | | DPX 


JPY 
A 
OPX PS VALUE ue J MO 
JPY SPFN ZER0 1M 
aca ome — FA 
INBS SwWR FM 
18) ™ SPFN 
OPT. ANC. OMA 
INBS - [Input 8us 438) Formattter INTERFACE PORT ae 


PNLBS- Panel Bus (16) PANEL | 
PANEL 
OMA 


DPA PS PSA 
MA TAA PNLBS TNBS JDPBS 


PNLBS 


PROGRAM 
SOURCE 
MEMORY 


S-PAD ALU FUNCTIONS 


S-PAD 
REGISTERS 


MEMORY 


towow 
oO 
70 
[va 
+ 
oO 


INTESER S-PAD SHIFTER FUNCTION 
ALU/ 


SHIFTER 


FLOATING 
POINT 
MULTIPLIER: 


FLOATING 
POINT 
ADDER: 
Alt+A2 
Al-A2 
A2-Al 
ABS(A2) 
Al EQV A2 
Al AND A2 
Al OR A2 
- FIX A2 


+ + 
BIND 


SPFN 


Z 
: INTEGER CONDITION 8IT 
STAGE 2 : NTEGER CONDITION 3ITS 


DPBS DATA PAD ADDRESS DPA 
N 
DPBS MEMORY ADDRESS MA 


STAGE 2 
STAGE 3 


FLOATING FLOATING + 
POINT DPBS ©) TABLE “EMORY ADDRESS 7MA 
CONDITION BEN 


BITS 


TMA ‘ 
7 P 
FA ONCE ees ADDRESS SA 
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Figure A-1 AP Functional Units 
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Table A~3 AP Instruction Summary 


UNCONDITIONAL FIELOS Each of the following fields may be used in any given instruction word. 


ec 


FIELD NAME 
SH SPS SPD 


8 sop SOP 1 FADD FADD1 Al A2 


A) NOP SOP 1 NOP NOP (S-PAD (S-PAD Q 
1 & SPEC WRTEXP L Source Dest. FSUBR FIX FM FA 1 
2 ADD WRTHMN RR Reg.) Reg. ) FSUB FIXT OPX DPX 2 
3 SUB WRTLMN R FADD FSCLT DPY DPY 3 
4 MOV NOP (0-17) (0-17) FEQV FSM2C ™ MD 4 
5 AND NOP FAND F2CSM ZERO ZERO 5 
6 OR NOP FOR FSCALE ZERO MDPX 6 
7 EQV NOP 10 FABS ZERO EDPX 7 
10 CLR 10 
Il INC 11 
12 DEC 12 
13 13 
14 14 
1§ LDSPE 15 
16 LOSPI 16 
7 LOSPT 17 


. FIELD NAME | 


COND DISP DPX OPY OPBS XR YR XW YW FM 
ar Perera nc TS gh RE SSS OSTEND TERA Ratatpeemmeneemaenremeeneneneneiey mp e ae eerny 


0 NOP (Branch NOP NOP ZERO (DPX (DPY (DPX (DPY NOP 0 
1 + Displace- OB 0B INBS Read Read Write Write FMU l 
2 BR ment) FA FA VALUE* Index ) Index) Index) Index ) 2 
3 BINTRQ (0-37) FM FM OPX 3 
4 BION DPY (0-7) (0-7) (0-7) (O-7) 4 
5 BIOZ MD 5 
6 BFPE SPFN 6 
7 RETURN ™ 7 
10 BFEQ 10 
ll BFNE 11 
12 BFGE 12 
13 BFGT 13 
14 BEQ 14 
15 BNE 15 
16 BGE 16 
7 BGT 17 


FIELD NAME 


Ml oM2 soME MA OPA TMA | 
FM FA NOP NOP NOP NOP 
OPX OPX FA INCMA INCDPA _INCTMA 
OPY DPY FM  DECMA CECDPA DECTMA 
DB  SETMA SETDPA —-SETTMA 


wn 
UINrO 


* This instruction uses a 16-bit immediate VALUE as a constant or address (in bits 48-63 
of this instruction). The YW, FM, M1, M2, MI, TMA and OPA fields are then disabled for 
this instruction word. 
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Table A-4 SPEC Fields 


SPEC FIELDS Qne of the SPEC Fields may be used per instruction word. The S-pad Fields (2, 
SOP, SOP1, SH, SPS, and SPD) are then disabled for this instruction. 


SPEC STEST HOSTPNL SETPSA PSEVEN PSODD PS SETEXIT 


2 STEST BFLT PNLLIT JUMP A* RPSPA* RPSIA* RPSLA* NOP ) 
1 HOSTPNL BLT DBELIT JSRA* RPS2A* RPS3A*  RPSFA®  SETEXA* 1 
2 SPMDA BNC DBHLIT JMP* RPS@* RPS1* RPSL* NOP 2 
3 NOP BZC DBLLIT JSR* RPS2* RPS3* RPSF* SETEX* 3 
4 NOP BDBN NOP JMPT RPSQT RPS1T RPSLT NOP 4 
9 NOP BOBZ NOP JSRT RPS2T RPS3T RPSFT SETEXT 3 
6 NOP BIFN NOP JMPP NOP NOP RPSLP NOP 6 
7 NOP BIFZ NOP JSRP NOP NOP RPSFP SETEXP 7 
19 SETPSA NOP SWOB NOP WPSPA* WPS1A*® LPSLA* NOP 19 
ll PSEVEN NOP SWOBE NOP WPS2A* WPS3A* LPSRA* NOP ll 
12 PSODD NOP SWDBH NOP WPSQ* WPS1* LPSL* NOP 12 
13 PS NOP SWDBL NOP WPS2* WPS3* LPSR* NOP 13 
14 SETEXIT BFL2 NOP NOP WP SOT WPSiT LPSLT NOP H 

15 NOP BFL1 NOP NOP WPS2T WPS3T LPSRT NOP 15 
16 NOP BFL2 NOP NOP NOP NOP LPSLP NOP 16 
17 NOP BFL3 NOP NOP LPSRP NOP 17 


NOP NOP 


* This instruction uses a 16-bit integer VALUE (in bits 48-63 of the instruction word). The 
YW, FM, M1, M3, MI, MA, TMA, and PDA Fields are then disabled for this instruction word. 


Table A-5 1/0 Fields 


1/0 FIELDS One of the 1/0 fields may be used per instruciton word. The floating adder fields 
(FADD, FADD1, Al, and A2) are then disabled for this instruction word. 


[0 . LDREG RDREG INOUT SENSE FLAG CONTROL 


) LDREG NOP RPSA QUT SNSA_ SFLQ HALT ’) 
1 RDREG LDSPO RSPO SPNOUT SP ININ SFL1 IORST l 
2 SPMDAV LOMA RMA QUTDA SNSADA SFL2 INTEN 2 
3 NOP LDTMA RTMA SPOTDA SPNADA SFL3 INTA 3 
4 INOUT LODPA RDPA IN SNSB CFLO REFR 4 
5 SENSE LDSP RSPFN SPININ SPINB CFL1 WRTEX 5 
6 FLAG LDAPS RAPS QUTDA - SNSBDA CFL2 WRTMAN 6 
7 CONTROL LDDA RDA SP INDA SPNOBA CFL3 NOP 7 
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APPENDIX B 


INSTRUCTION SUMMARY 
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Table B-2 S-pad Group 


FIELD Hales MNEMON IC 


7 " ee arse 


NOTE 


These are logical shifts: 


EFFECT 
No-op 
Use SPon¢ (bit-reversed) 
See SOP1 field 
See Special Operations Group 
(SP cpp) + (SP gpg) —® SPFN 
(SP opp) - (SP cp, ) —» SPFN ‘ 
(SPp) —> SPFN 
(SP oop) AND (SPcpg) —> SPFN 
(SP cpp) QR (SPcog) > SPFN 
(SPoop) XOR (SPcpg) > SPFN 
No-op 
SPFN*2 —SPFN (left shift) 
SPFN+4 —>SPFN (double right shift) 
SPFNe2 —>SPFN (right shift) 
S-Pad Spueed Operand Address 
S-Pad Destination Address, SPFN ——> 


SPcpp unless inhibited by No Load 
(COND = 1) 


Right shift 0 | 9-15 [cj 
Left shift [C] a 
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Table B=-2 S-=—pad Group (cont.) 


| coe | MNEMONIC EFFECT (see NOTE) 


1 WRTEXP Restricts DPX, DPY & MI fields to 
Write Exponent Only 
‘ARTHMN Restricts DPX, DPY & MI fields to 
Write High Mantissa Only (Bits 20-11) 
3 WRTLMN Restricts DPX, DPY & MI fields to 
Write Low Mantissa Only (Bits 12-27) 
ee 


rama ener 
Z eer 
ee LOSPNL SP coy —PSPFN, PNLBS —PSPcop 
E 
pos | LDSPE SPopy —PSPFN, OBS - 512—*SP cop 
16 LDSPI SP... —>SPFN, 084 —»sp 
SPD , SPD 


MT 


NOTE 


MH = Mantissa High * Mantissa bits 0@-11 

ML = Mantissa Low 3 Mantissa bits 12-27 

MT = Mantissa bits for table lookups = Mantissa bits 02-98 
—E = Exponent 
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Table B-3 Special Operations Group 


id. 13 


HOSTPNL 
SETPSA 
PSEVEN 


SETEXIT 


. OCTAL F 
FIELO pa MNEMON IC EFFECT 


SPMDA . Spin until MD available 


1g _ See SETPSA Field, inhibit TEST except 
No Load (8-8) 
See PSODD Field (8-19) 


—_ 
n 
yr 8 H 
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Table B-3 Special Operations Group (cont.) 


I ate OCTAL sek a 
FIELD CODE MNEMON IC EFFECT (see NOTE) 
STEST a BFLT Branch if FA<9.2 


Branch 


1S BFLI Branch i 
Branch i 
NOTE 


If the above specified condition is true OR 


if 


if 


S-Pad carry dit 


u 
a 


08 <9.2 


DB positive and unnormalized 


u 
oy 


Inverse FFT flag 


Inverse FFT flag 


Flag @ =1 

Flag l= 1 

Fiag 221 _ 

si 321 _ 


the condition specified in the COND field is 


true, a branch occurs to (PSA) + OISP-29. 


cmemsinainsemmmmmiensanes dines daltannisdtelie eee oenaranipeneamemamninsaateninatiotinma till 
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1340 


Table B-3 Special Operations Group (cont.-) 


FIELD EFFECT (see NOTE 1) 
HOSTPNL fos [mr PNLBS —»LITES 
ar ae 


ee 


ll SWDBE (SWR)—» PNLBS —» OB° and WRTEXP 
(see NOTE 2) 


12 SWOBH (swR)—> PNLBS —> a8’) and WRTHMAN 
(see NOTE 2) 


13 SWDBL (SWR)—>PNLBS —» pBYY and WRTLMAN 
(see NOTE 2) 


| 
i 


NOTE 


1) MH = Mantissa High = Mantissa bits #@-11 
ML = Mantissa Low = Mantissa bits 12-27 
—E = Exponent 


2) Restrict OPS, DPY and MI to: 
WRTEXP: Write Exponent only 
WRTHMAN: Write High Mantissa only (bits @9-11) 
WRTLMAN: Write Low Mantissa only (bits 12-27) 
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Table B=3 Special Operations Group (cont.) 


FIELD OCTAL 
CODE MNEMONIC EFFECT (see NOTE) 
SETPSA ee ee JMPA VALUE —-»PSA 


(SRA) + 1—»SRA, (PSA) + 1—>SRScpq> 
VALUE —» PSA 
JMP VALUE + (PSA) —-»PSA 


3 JSR (SRA) + 1—®SRA, (PSA) + 1 —*SRScon, 
VALUE + (PSA) —-» PSA 


-  JSRT (SRA) + 1——»SRA, (PSA) + L—SRScpqs 
(TMA) —> PSA 
|e oars 


7 JSRP (SRA) + 1 —-®SRA, (PSA) + 1 —SRScpqs 
(SWR) —»PNLBS —PSA 


NOTE 


VALUE = Sits 48-63 of this instruction (CB48-CB63) 
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Table B-3 Special Operations Group (cont.) 


FIELD pelle MNEMONIC EFFECT (see NOTE 2) 


OSEYEN io on. 8 RPSBA (PSaeine) —» PNLBS —> LITES 

se a 
ag 

a RPSD _ (PSYALuespsa) > PNLBS —eLITES 
Q2 

ee ee FSyaLuespsay = NERS ee tHTEs 

RPSQT | _—(psaB) —>PNLBS —>LITES 

RPS2T (psaz)—> PNLBS —>LITES 


_— 
a ee (SWR)—>PNLBS—ePSHe 
a 
i 
ee ee (SHR)—> PNLBS —>Psie 

. (SHR) —PPNLBS—>PSee, 


2 
ce a 
NOTE 


1) This field requires 2 cycles to execute. 


2) VALUE = Bits 48-63 of this instruction (CB48-CB63) 
QP Quarter zero of Program Source Word (PS#A-PS15) 
Q2° = Quarter two of Program Source Word (PS31-PS47) 


ee enemas 
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OCTAL 
CODE 


i 
ret) 
Co 
ra 
4?) 
wo 


?SODD 
(see NOTE 1) 


- 


wm e pai — _ 
w Nm ed Ss 


re 


ee i 
be | oO wm 


a neni i ae 
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-3 Special Operations Group (cont.) 
MNEMONIC EFFECT (see NOTE 2} 
a a RR LT SE 
Ql 
RPSIA (psGl .) — PNLBS —* LITES 
mel 
RPS3A (PS@3, .) —» PNLBS —>LITES 
Q1 
RPS1 (PSO seapsa) ——P PNLBS —*LITES 
93 
RPS3 (PSC3 capa) ——>PNLBS —* LITES 
RPSIT (psd) —» PNLBS—> LITES 
RPS3T (ps3) —» PNLBS —+ LITES 


Ql 


WPSIA (SWR) ——-> PNLBS PS ALUE 


—WPS3A (SWR)——» PNLBS —> Scie 


Ql 
WPS1 (SWR)——®PNLBS —>PSy 5) eon 


(SWR) —» PNLBS ——» Pse? 


WPS3 VALUE+PSA 


Ql 
TMA 


Q3 
(SWR) ——>PNLBS ——> PSrayq 


WPS1T (SWR) ———» PNLBS ——>PS 


WPS3T 


NOTE 


This field requires 2 cycles to execute. 


VALUE = Bits 48-63 of this instruction (CB48-CB63) 
91 = Quarter one of Program Source Word (PS16-PS31) 
Q3 = Quarter three of Program Source Word (PS48-PS63) 
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Table B-3 Special Operations Group (cont.) 


FIELO pA MNEMONIC EFFECT (see NOTE 2) 


a a ae RPSLA Se) —>B 
a NOTE 1) 
se RPSFA (PSraL jg) P08 
= (5th ueeosn) — 
3 RPSF (pst P ) -——»0B 
| YALUE+PSA 
RPSLP ane js OB 
| PNLBS 
19 LPSLA pp ——»pst 
VALUE 
LH 
: RH 
13 LPSR DB ——>PSia yea psa 
LH 
14 LPSLT DB —>PSti, 
RH 
LH 
RH 
LPSRP 08 —»pset 


NOTE 


1) This field requires 2 cycles to execute. 


2) VALUE = Bits 48-63 of this instruction (CB48-CB63) 
LH = Left half of Program Source Word (Bits 29-31) 
RH Right half of Program Source Word (Bits 32-63) 
FP Program Source bits 26-63, used for floating-point literals 
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Table B-3 Sp2cial Operations Group (cont.) 


EFFECT (see NOTE) 


SETEXIT 


VALUE ——® SRScaq 


VALUE + (PSA) ——*SRSconq 


SETEXT TMA ——® SRSca 4 


_ SETEXP PSA + 1 ——SRScaq 


NOTE 


Sets the current subroutine return address as indicated above. 
SRA does not change. 
VALUE = Bits 48-63 of this instruction. 


near rr a pn ee ST CS ST TD, 
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Table B-4 Floating Adder Group 


FADD Tea See FADD! field 

a | Logical Equivatence: (Al) XOR (2) 
a ae . ‘Logteal and: (Al) AND (A2) 
ee FOR: : Logical or: (AL) OR (a2) 
a a 

Pe | encao | (OPXopasiox! ——*Al where XR = 10X+4 
ee Be (OPYopa+iox) —* Al where YR = 1DX+4 
Ser ie 

= 


NOTE 


All. floating adder op-codes: 
1) Align exponents 


2) Perform the specified arithmetic, 
logical, or shift operation 


3) Normalize 
4) Convergently round 


1947 
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Table B-4 Floating Adder Group (cont.) 


OCTAL 
FIELD CODE MNEMONIC EFFECT 


A2 ae (A2) ——» A2 


— OPX(1DX) (DPXpasipx )——p A2s Where XR = 1DX+4 
Lnciaed DPY(1DX) (OPV onsipx )——» A2» where YR = IDX+4 


(MD) ——» A2 
2.0 ——» AZ 


(opx™ 


. M 
SPFN+512 —» AZ", DPA+1DX ) —-» A2 


: ; | 
(DPXppa+ipx! —»a2=, spFN—» A2" (99-91), 
g —» A2"(g2-27) 


No-op 
Convert (AZ) to an integer 
Convert (A2) to an integer (result truncated) 


Shift (A2) right and increment A2= until 
a2e = (SPFN+511) (result truncated). 


Convert (A2), from signed Magnitude to 2's 
complement. 


Convert (A2) from 2's complement to signed 
magnitude. 


FSCALE Shift (A2) right and increment A2@ until 
A2= = SPEN+S11. 
FABS Take the absolute value of (A2). 
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Table B-5 1/0 Group 


nT 7 ae LOREG 
INOUT 
SENSE 


a ee 
CONTROL 


1/0 re ee See LDREG field 
a Spin until MD available 
= 


Le LDAPS DPBS ——> APSTATUS 
04 ras 


1049 


FPS 860-7259-003 B= 15 


Table B-5 1/0 Group (cont.) 


FIELD ae , MNEMONIC EFFECT 


(PSA) ——» PNLBS 
(SPD) ——» PNLBS 
(MA) ——®» PNLBS 
(TMA) —-» PNLBS 
(DPA) ——» PNLBS 
SPFN ——> PNLBS 
(APSTATUS ) ——» PNLBS 
(DA) ——> PNLBS 

INOUT OUT DPBS ——» IODEVICE 


DA 


SPNOUT SPIN if TODROY y, = 9 
OPBS ——»> I0DEV ICEns 


QUTDA OPBS ——> IODEVICEDy » SPFN ——» DA 


SPOTDA SPIN if IODRDY = 2, SPFN-——»DA 
DPBS —> IODEVICE,, 


IN (IODEVICE,,, ) —+» INBS 


SPININ SPIN if TODRDY yy =) 
(TODEVICEn,) ——> INBS 


INDA (IODEVICE,,) — >» INBS, SPFN —-» DA 


SPINDA SPIN if TODROY 4, = 9, SPFN——» DA 
(TODEVICE,,, ) —— > INBS 
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Table B-5 1/0 Group (cont.) 


FIELD rt EFFECT (see NOTE) 
EE 
Any ——P {ODRDY, SPIN if IODRDY = 9 
Aya ——PIOORDY, SPIN if IOORDY = 9, 
SPFN —» DA 
re ee 7 Bog ——> IODRDY, SPIN if IODRDY = 9 
Boy ——> IODRDY, SPIN if IODRDY = 9, 
a SPIN —> DA 
= 
= 
a 
= 
NOTE 
A and 8 are I/O device dependent conditions, either 1 or @. 
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Table B-5 1/0 Group (cont.) 


FIELD | tone | MNEMONIC EFFECT 


CONTROL HALT Halt 


If CTL@5 is set see Programmer's 
TATEN Reference Manual Part II, page 9. 


INTA Interrupt acknowledge. Device Address of 
interrupting device put onto DPBS. 


~ REFR Memory refresh sync 


WRTEX Restricts OPX, DPY & MI to Write 
exponent only 

WRTMAN Restricts DPX, UPY & MI to Write Mantissa 
only (Bits 9-27) 
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Table B-6 Branch Group 


FIELD ne MNEMONIC EFFECT (see NOTE 2) 


Branch always 


a a ee 
3. BINTRQ Branch if INTRQ (Interrupt Request 
flag = 1) 


BION Branch if IODRDY), flag = 1 
. B10Z Branch if IODRDY,, flag = 


6 BFPE Branch on floating-point arithmetic 
error (overflow, underflow, or divide 
by zero). 

7 


RETURN (SRScoq) —+» PSA, (SRA) - 1——»SRA 
(see NOTE 1) (subroutine return jump). 


ae Branch if SPFN > @ 


DISP 2 to 37 If branch condition is true, 
(see NOTE 3) (PSA) + DISP - 20 ——» PSA 


NOTE 


i} 
>) 


1) "RETURNS" may not be made in two successive instructions. 


2) FA and SPFN are tested as to their state for the previous 
instruction. 


3) Thus the effective Branch Range is -28 to +17 relative to 
the current instruction. 
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Table B=-7 Data Pad Group 


OCTAL | Dee 
ae MNEMONIC EFFECT 
= aera apoo? 


DPX 

(see NOTE 1) 
DENEID RE Sb + |  DPBS——* OPXopasiox, Where XW = 1OX+d 
DEALIE SISA FA ——® DPXopa+iDX, Where XW 


= 1DX+d 
ja DPX (1DX )<FM FM ———® OPXooy+1px, Where XW = 1DX+4 
No-op . 
aa NOTE 1) 
op OPY(1DX)<DB DPBS ——* DPYnpasipx, where YW = 1DX+4 


OPY(10X)<FA FA —— > OPYnpa+i0x, Where YW = 1DX+4 
OPY(1DX)<FM FM ——— OPYoonsipx, Where YW = LDX+4 


DPBS ee DB=ZERO 2.8——> 0B 
OB=INBS INBS ——-> 08 


DB=VALUE VALUE —» ap®, vaLuE—» 38 “, 
sign extended into 28 MH 
pa DB=9PX(1DX) (OPX yp asipx) —~*OB , Where XR = LOXt+4 
=e DB=0PY(1DX) (OPYnpa+10X) —~>0B , Where YR = 1DX+4 


DB=SPEN |  SPEN'+ 512—» op ©, sppN—*» 93 4, 
| sign extended into 08 el 


NOTE 


1) All bits written unless WRTEXP, WRTHMAN or WRTLMAN set. 
See SOP1 and HOSTPNL field. 


2) OPBS forced to @ if HOSTPNL field = 18 t 
ML = Mantissa Low (Mantissa Bits 12-27) 
MH = Mantissa High (Mantissa Bits @-11) 
—E = Exponent 
VALUE is a 16-bit 2's complement number, contained in 
bits 48-63 of the instruction word. 
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rTELo CODE MNEMONIC 
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Table B-7 Data Pad Group (cont.) 


EFFECT 


DPX Read EFA is (OPA) + XR - 4 
apy Read EFA is (DPA) + YR - 4 
DPX Write EFA is (DPA) + XW - 4 
OPY Write EFA is (DPA) + YW - 4, 


YW=XW if VALUE is used in another 
field. 
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Table B-8 


Floating Multiplier Group 


M2 


MNEMONIC 
DPX (1DX) 


DPY (1Dx) 


DPX (10x) 


DPY(1DX) 


NOTE 


EFFECT 


No-op 
Multiply: (M1)*(M2) 

FM —» M1 

(DPXpoqxipx) ——> Mi» Where XR=1DX+4 


(DPY no nginx! —> M1, Where YR=1DX+4 


(™) —»> M1 


FA ——> M2 
(DPX ppariox) M2 Where XR=10X+4 


(PY ppaeaon M2, Where YR=1DX+4 


(MD) ——-» M2 


These fields are not in effect if VALUE is used in another field. 
Arguments that are unnormalized by more than one position will 


proauce incorrect results. 


SRE SRSA IEE SSSR NIAID IOC A LE MLPA NOE GN ENED A, ATG TOES TANI SEI I: DATO EASES TET ARE LIED ICO A RE RS SEI LEI LOM AEE SN 
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Table B-9 Memory Group 


56 57 58 59 60 61 62 63 
ee ie 
FIELD es MNEMONIC EFFECT (see NOTE 3) 
(see NOTE 1) 
1 MI<FA FA ——» MI, Write MI into Data Memory 
(see NOTE 2) 
2 MI<FM FM —» MI, Write MI into Data Memory 
(see NOTE 2) 
3 MI<DB DB —*> MI, Write MI into Data Memory 
, (see NOTE 2) 
1 INCMA (MA)+1—-* MA, initiate a Data Memory 
cycle 
2 DECMA  (MA)-1——» MA, initiate a Data Memory 
cycle 
3 SETMA SPFN ——» MA, initiate a Data Memory 
cycle 
INCDPA (DPA)+1 —» DPA 
DECDPA (DPA)-1 —-» DPA 
SETDPA SPEN —» OPA 
1 INCTMA (TMA)+1 —~» TMA, initiate a read from 
Table Memory 


DECTMA (TMA)+1——> TMA, initiate a read from 
Table Memory 


E 


LA A TTT LTE NT TT 


SETTMA SPFN ——> TMA, initiate a read from 
Table Memory 


Ee 


NOTE 


1) These fields are not in effect if a value is used by another field. Changes made in MA, TMA, 
or DPA do not affect the values of these registers used by other fields during the current 
instruction. 

2) All bits written unless WRTEXP, WRTHMAN or WRTLMAN is set. See SOP1 and HOSTPNL fields. 

3) pB is used in place of SPFN if LDREG field is used. 
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Notice to the Reader 


e@ Help us improve the quality and usefulness 
of this manual. 


e Your comments and answers to the following 
READERS COMMENT form would be appreciated. 


erg =: 7 


To mail: fold the form in three parts so 
that Floating Point Systems' 
mailing address is visible; seal. 


Thank you 


READERS COMMENT FORM 


Document Title 


Your comments and answers will help 
us improve the quality and usefulness 
of our publications. If your answers 
require qualification or additional 
explanation, please comment in the 
space provided below. 


How did you use this manual? 


-AS AN AID FOR ADVANCED TRAINING 


TO INSTRUCT A CLASS. 

AS A STUDENT IN A CLASS 
AS A REFERENCE MANUAL 
OTHER 


ct Ee ct ete et a 
wr Ne OO” 


Name 
Firm 
Address 

Telephone 


AS AN INTRODUCTION TO THE SUBJECT 


TO LEARN OF OPERATING PROCEDURES | 


Page Description of error or deficiency 


a 


Did you find this material... 


YES 


= 
© 


USEFUL? ( 
COMPLETE? ( 
ACCURATE? ( 
WELL ORGANIZED? ( 
WELL ILLUSTRATED? ( 
WELL INDEXED? ( 
‘EASY TO READ? ( 
EASY TO UNDERSTAND? ( 


ev fw NS Oe Oe ee” 
on™ OSE EG 
Meer ee” 


Please indicate below whether your 
comment pertains to an addition, 
deletion, change or error; and, where 
applicable, please refer to specific 
page numbers. 


Title 

Department 
City, State 
Date 


First Class 
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Portland, 
Oregon 
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